TOPIC

LLM Agents

18 synthesis notes · 55 source papers
View as

Can careful selection of 78 demos outperform massive training datasets?

Does strategic curation of high-quality demonstrations unlock agentic capability more efficiently than scaling training data? LIMI achieved 73.5% on AgencyBench with 78 samples versus 10K+ samples for competing models, suggesting data quality may matter more than quantity.

Explore related Read →

Why do capable AI agents still fail in real deployments?

Explores whether agent failures stem from insufficient capability or from missing ecosystem conditions like user trust, value clarity, and social norms. Understanding this distinction matters for predicting which agents will succeed.

Explore related Read →

Does agent efficiency really break down into three distinct components?

Can we understand agent efficiency as three independent optimization problems—memory, tool use, and planning—each with separate cost drivers? This matters because it could explain why point optimizations keep missing the bigger picture.

Explore related Read →

What should we actually measure in agent evaluation?

Current agent benchmarks reduce performance to a single success metric, potentially hiding critical differences in how agents operate. What dimensions beyond task accuracy should evaluation frameworks capture?

Explore related Read →

How do agentic AI systems decompose into adaptation paradigms?

What are the core dimensions that distinguish different approaches to adapting agents and tools in agentic systems? Understanding this taxonomy could clarify which adaptation strategy fits which problem.

Explore related Read →

Can AI research itself without losing human oversight?

Explores whether AI systems can internalize the human judgment and insight-distillation that normally drives research progress, and what this means for maintaining meaningful human control over AI advancement.

Explore related Read →

What makes detecting AI agent traps fundamentally difficult?

Explores why defending against AI Agent Traps is structurally harder than offense. Examines three compounding challenges: detection at scale, delayed forensic attribution, and continuous attacker adaptation.

Explore related Read →

How do adversarial traps target different layers of AI agents?

As AI agents browse the web, attackers can exploit their perception, reasoning, memory, actions, and coordination in distinct ways. Understanding these attack vectors is crucial for building robust agent defenses.

Explore related Read →

Can API-first agents outperform UI-based agent interaction?

This explores whether directing agents to use APIs instead of navigating UIs reduces task completion time and errors. The question matters because current LLM agents struggle with sequential UI steps that multiply latency and hallucination risk.

Explore related Read →

Can agents learn new skills without forgetting old ones?

Explores whether externalized skill libraries—storing learned behaviors as retrievable code rather than parameter updates—can solve the catastrophic forgetting problem that plagues continual learning systems.

Explore related Read →

Why do AI agents fail at workplace social interaction?

Explores why current AI agents struggle most with communicating and coordinating with colleagues in realistic workplace settings, despite strong reasoning capabilities in other domains.

Explore related Read →

Can multi-agent teams automatically remove their weakest members?

Explores whether agents can score each other's contributions during problem-solving and use those scores to deactivate underperforming teammates in real time, improving overall team efficiency.

Explore related Read →

Why does agent efficiency differ from model size reduction?

Explores why making models smaller doesn't solve agent cost problems. Agents loop recursively, compounding costs multiplicatively, so efficiency requires system-level design, not just parameter reduction.

Explore related Read →

Can we automatically optimize both prompts and agent coordination?

This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.

Explore related Read →

Can language help agents imagine goals they've never seen?

How might compositional language enable artificial agents to target outcomes beyond their training experience? This matters because it could unlock open-ended exploration without hand-coded reward functions.

Explore related Read →

Do efficiency techniques across agent components reveal shared structural constraints?

Despite targeting different parts of agentic systems, efficiency techniques converge on similar principles. This raises a question: are these convergences independent discoveries, or do they reflect deeper architectural constraints that all agent systems face?

Explore related Read →

Is agent memory capacity or quality the real bottleneck?

While more storage seems like the obvious solution to memory problems, what if the real constraint is actually curation—deciding what to keep, discard, and retrieve without degrading performance?

Explore related Read →

What security threats emerge when machines read the web?

The web's trust infrastructure evolved for human readers—visual cues, domain reputation, rendering semantics. As AI agents become primary readers, what new attack surfaces and manipulation strategies does this architectural mismatch create?

Explore related Read →

Source papers 55

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.