How do model capabilities differ from harness infrastructure in agents?
What distinct layers make up an agentic system, and how do failures in each layer differ? Understanding this decomposition helps pinpoint whether problems stem from the model, the infrastructure, or the agent's own code.
Talking about "agent code" as one thing obscures three distinct elements that the code-as-harness survey separates. First, model-internal capabilities: the reasoning, perception, planning, simulation, and evaluation abilities baked into the model's weights. Second, system-provided harness infrastructure: the predefined tools, APIs, sandboxes, memory systems, validators, permission boundaries, telemetry, and workflows that connect model outputs to external actions and feedback — this is the main focus of harness engineering. Third, agent-initiated code artifacts: the interactive code objects an agent itself creates, executes, observes, revises, persists, and shares within the execution loop. These three are coupled but governed by different design levers.
The decomposition is useful because each element fails and improves differently. You strengthen model-internal capability by training; you strengthen the harness by engineering infrastructure; you strengthen agent-initiated artifacts by shaping how the agent generates and reuses its own code. Confusing them leads to misattributed failures — blaming the model for what is really a harness gap, or vice versa. The counterpoint is that the boundaries blur in practice: a skill the agent writes once may be promoted into harness infrastructure, and harness validators shape what the model learns to emit. But as an analytical frame it clarifies where to intervene. This matters because it gives harness engineering a vocabulary for separating the controllable layers of an agentic system.
Inquiring lines that use this note as a source 2
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Where does agent reliability actually come from?
Exploring whether LLM agent performance depends on larger models or on thoughtful system design choices like memory, skills, and protocols that shift cognitive work outside the model.
both isolate the harness as a layer distinct from the model and a primary source of capability
-
Does a single benchmark score actually predict agent readiness?
Single-axis benchmarks rank models by one capability—like task success—but ignore privacy, duration, operating mode, and ecosystem fit. Can one number really capture what matters for deployment?
parallels the move to decompose agent ability into separable components rather than one scalar
-
What makes agent-created code artifacts so hard to manage?
Agent-authored code that persists and is shared across systems raises difficult questions about what should be kept versus discarded, and how to maintain consistent state when multiple agents collaborate on the same artifacts.
extends: same survey; this decomposition names the three elements and that companion note singles out the third (agent-initiated artifacts) as the least-studied frontier
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Code as Agent Harness
- Agents of Chaos
- Why Do Multi-agent LLM Systems Fail?
- Small Language Models are the Future of Agentic AI
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
- Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures
Original note title
agent code splits into model-internal capability system-provided harness and agent-initiated artifacts