TOPIC

Mobile and On-Device LLMs

4 synthesis notes · 2 source papers
View as

Does depth matter more than width for tiny language models?

Explores whether deep-and-thin architectures outperform wide-and-shallow ones at sub-billion scales, and why this might contradict larger-model scaling laws.

Explore related Read →

Does recomputing weights cost less than moving them on mobile?

Explores whether mobile hardware's memory bottleneck makes it cheaper to recompute transformer blocks than to fetch their weights twice, and whether this trades accuracy for efficiency.

Explore related Read →

What actually limits language models on mobile phones?

Is the shift toward smaller LLMs driven by quality trade-offs, or by hard physical constraints on device memory and battery life? This note examines whether sub-billion models are a practical necessity rather than a compromise.

Explore related Read →

Can ternary weights match full precision model performance?

Can models trained natively with only three weight values (−1, 0, 1) achieve the same perplexity and task performance as standard full-precision models? This matters because ternary weights could dramatically reduce computational and energy costs.

Explore related Read →

Source papers 2

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.