TOPIC

Task Planning

9 synthesis notes · 70 source papers
View as

Can command generation replace intent classification in dialogue systems?

Explores whether generating pragmatic commands in a DSL could outperform traditional intent classification for task-oriented dialogue, particularly regarding training data needs and scalability.

Explore related Read →

Can LLMs actually forecast time series better than we think?

Explores whether language models possess stronger forecasting ability than current benchmarks suggest, and what role workflow design plays in revealing or hiding that capability.

Explore related Read →

Can large language models actually create executable plans?

Do LLMs genuinely assemble plans that work, or just generate planning-domain knowledge that sounds coherent? Understanding this distinction matters for deploying AI in real planning tasks.

Explore related Read →

Can decomposing forecasting into stages unlock numerical and contextual reasoning?

This explores whether breaking time-series forecasting into separate stages for contextualization, dual-resolution outlook, and synthesis allows systems to combine the strengths of numerical models and language models more effectively than either alone.

Explore related Read →

Does tree depth automatically produce supervision at multiple granularities?

Tree-search rollouts branch at different depths, potentially creating supervision signals ranging from coarse strategy-level to fine-grained detail-level choices. Does this depth variation naturally yield multi-granular process supervision without explicit annotation design?

Explore related Read →

Can shared-prefix trees reduce redundancy in agent rollouts?

Independent rollouts waste tokens regenerating similar early-turn sequences. Can structuring rollouts as shared-prefix trees instead preserve early computation across samples while maintaining statistical diversity for advantage estimation?

Explore related Read →

How much of LLM few-shot ability comes from training data?

Do large language models genuinely learn from a few examples, or are they mostly recognizing patterns from their training data? This matters for understanding what LLMs can actually do.

Explore related Read →

Can tree structure alone convert outcome rewards into process supervision?

Tree-based rollouts naturally create step-level preference signals by comparing sibling subtrees. Can this structural approach replace separate process reward models without explicit step-level annotation?

Explore related Read →

Source papers 70

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.