Can language agents be represented as optimizable computational graphs?

This explores whether you can model an AI agent's whole workflow—its prompts and the way information flows between steps—as a graph that can be tuned automatically, rather than hand-built.

This explores whether an AI agent's inner workings—its prompts and the connections between reasoning steps—can be captured as a graph and then optimized automatically, instead of redesigned by hand. The corpus says yes, and the payoff is bigger than it first appears. When you represent a language agent as a computational graph (nodes are operations, edges carry information between them), popular prompting strategies like chain-of-thought, tree-of-thought, and Reflexion stop looking like separate inventions and turn out to be the same kind of structure wearing different clothes Can we automatically optimize both prompts and agent coordination?. Once they're all graphs, you can optimize along two axes at once: the text inside each node (the prompts) and the wiring between nodes (who talks to whom).

What makes this useful is that the two things people normally tune by trial and error become search problems. The same note shows you don't have to choose between improving a prompt and rearranging the agent's coordination—both can be learned. That reframes a lot of "agent engineering" as graph optimization rather than craft.

The corpus also hints at where the edges of the graph come from and where the graph stops helping. On the wiring side, there's a complementary idea: instead of hand-connecting which agent handles what, you can let agents discover each other through semantic capability vectors, making coordination a learned, scalable operation rather than manual plumbing Can semantic capability vectors replace manual agent routing?. And a node in such a graph need not be a giant model—much of the repetitive work can be handled by small language models, with large ones called only selectively, which is itself a structural optimization of the graph's cost Can small language models handle most agent tasks?.

But a graph that optimizes prompts and edges is still only as good as the signals flowing through it. Reflexion—one of the techniques the unifying view absorbs—works precisely because it feeds back an unambiguous success/failure signal that the agent stores as memory and reuses, no weight updates required Can agents learn from failure without updating their weights?. That's the kind of clean signal graph optimization thrives on. The harder truth is that other parts of the agent live outside the graph: turning a model into something that reliably acts in the world takes pipeline transformation—data, grounding, infrastructure, safety—not just rewiring nodes Can you turn an LLM into an agent by just fine-tuning?.

The most interesting limit is conceptual. Optimizing a graph is a form of self-improvement, and self-improvement in language models has a formal ceiling: every reliable fix needs something external to verify it, because a system can't escape the gap between generating an answer and checking it through metacognition alone What stops large language models from improving themselves?. So "optimizable computational graph" is real and powerful for unifying techniques and automating design—but the optimizer still needs an outside signal to climb toward, which is exactly why the trial-and-error feedback loops keep showing up as the thing that makes the graph learn at all.

Sources 6 notes

Can we automatically optimize both prompts and agent coordination?

Language agents represented as computational graphs—where nodes are operations and edges define information flow—reveal that CoT, ToT, and Reflexion are formally equivalent structures. This unified view enables automatic optimization of both node prompts and edge connectivity without manual redesign.

Can semantic capability vectors replace manual agent routing?

Versioned Capability Vectors embedded in HNSW indices couple semantic matching with policy and budget constraints, making capability discovery a first-class operation that scales sub-linearly as agent heterogeneity increases.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can you turn an LLM into an agent by just fine-tuning?

Converting LLMs to action-capable systems requires four distinct stages: curating action-environment-user datasets, training for action grounding, integrating agent infrastructure with memory and tools, and rigorous safety evaluation. The surrounding system and harness determine whether actions are grounded or hallucinated.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question: **Can language agents be represented as optimizable computational graphs?** — remains open despite recent work. A curated library (2023–2026) found:

**What a curated library found — and when (dated claims, not current truth):**
- Chain-of-thought, tree-of-thought, and reflexion unify as graph structures with optimizable nodes (prompts) and edges (agent routing), enabling joint text + wiring search (~2024).
- Agents can discover coordination partners via semantic capability vectors instead of manual plumbing, scaling learned coordination (~2025).
- Small language models handle 70–80% of agentic subtasks; large models called selectively; this is itself a graph-structural cost optimization (~2025).
- Reflexion-style agents learn via episodic memory and external success/failure signals *without weight updates*, proving clean feedback loops drive graph learning (~2024).
- Self-improvement in agents hits a formal ceiling: no system can verify its own fixes via internal metacognition alone; external signals remain mandatory (~2026).

**Anchor papers (verify; mind their dates):**
- arXiv:2402.16823 (2024) — Language Agents as Optimizable Graphs
- arXiv:2509.20175 (2025) — Federation of Agents: Semantics-Aware Communication
- arXiv:2506.02153 (2025) — Small Language Models are the Future of Agentic AI
- arXiv:2605.30621 (2026) — Harness Updating and Self-Evolution Disentangled

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For prompt optimization and graph wiring: have newer scaling laws, post-training techniques (DPO, ORPO), or real-time adaptation harnesses since relaxed the need for external signals? For small-model routing: do emergent routing protocols now emerge in-graph, or does capability-vector coordination still require external scaffolding? For self-improvement: has any closed-loop metacognitive mechanism bypassed the verification gap, or does the external-signal requirement remain ironclad? Separate durable question (graph unification) from perishable limits (signal dependency).
(2) **Surface the strongest CONTRADICTING or SUPERSEDING work** from the last ~6 months. Look for papers showing graphs *underperform* hand-tuned agents, or that graph optimization converges to trivial solutions, or that online agentic learning obsoletes offline graph design.
(3) **Propose 2 research questions** that assume the regime may have shifted: (a) If graphs can now be optimized *during* deployment via online feedback, what is the minimal external signal needed?  (b) Do federated agent graphs reach coordination complexity where graph optimization becomes intractable, and do hierarchical or modular graph decompositions solve it?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can language agents be represented as optimizable computational graphs?

Sources 6 notes

Next inquiring lines