What stops large language models from improving themselves?

How capable are LLMs as autonomous agents, where alignment fails, and why self-improvement has structural limits.

Topic Hub · 13 linked notes · 4 sections

View as

Sub-Topic Maps

2 notes

Why do multi-agent systems fail despite individual capability?

Multi-agent systems show lower performance than individual models despite coordinating multiple reasoning instances. What structural failures emerge when multiple LLMs deliberate together, and what ecosystem conditions are required for effective autonomous cooperation?

What actually constrains large language models from self-improvement?

Research explores whether alignment philosophy, safety evaluation methods, and formal bounds on self-improvement can reliably prevent harmful scaling behaviors in LLMs, particularly self-valuation above humans and alignment faking.

Cross-Paper Synthesis (2026-05-18)

2 notes

Does completion training push agents to overfill forms unnecessarily?

Explores whether agents trained to complete tasks end up filling optional fields they shouldn't touch. This matters because it creates privacy risks from over-helpfulness rather than malice.

Does a single benchmark score actually predict agent readiness?

Single-axis benchmarks rank models by one capability—like task success—but ignore privacy, duration, operating mode, and ecosystem fit. Can one number really capture what matters for deployment?

Harness Self-Evolution — Batch #3 backlog (2026-06-03)

2 notes

Do stronger models always evolve their own harnesses better?

When AI agents self-improve their prompts and tools, does raw model power help equally at writing updates versus using them? Understanding this split could reshape how we design self-evolving systems.

How can agent self-evolution be made safe and auditable?

As agents begin updating their own prompts and tools, how can we track these changes, measure their effects, and safely reverse problematic updates? This matters because untracked evolution leads to unmaintainable systems and makes regressions impossible to diagnose.