LIMA: Less Is More for Alignment

Paper · arXiv 2305.11206 · Published May 18, 2023
LLM Alignment

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback.

Introduction. Language models are pretrained to predict the next token at an incredible scale, allowing them to learn general-purpose representations that can be transferred to nearly any language understanding or generation task. To enable this transfer, various methods for aligning language models have thus been proposed, primarily focusing on instruction tuning [Mishra et al., 2021, Wei et al., 2022a, Sanh et al., 2022] over large multi-million-example datasets [Chung et al., 2022, Beeching et al., 2023, Köpf et al., 2023], and more recently reinforcement learning from human feedback (RLHF) [Bai et al., 2022a, Ouyang et al., 2022], collected over millions of interactions with human annotators. Existing alignment methods require significant amounts of compute and specialized data to achieve ChatGPT-level performance. However, we demonstrate that, given a strong pretrained language model, remarkably strong performance can be achieved by simply fine-tuning on 1,000 carefully curated training examples.

Discussion / Conclusion. We show that fine-tuning a strong pretrained language model on 1,000 carefully curated examples can produce remarkable, competitive results on a wide range of prompts. However, there are limitations to this approach. Primarily, the mental effort in constructing such examples is significant and difficult to scale up. Secondly, LIMA is not as robust as product-grade models; while LIMA typically generates good responses, an unlucky sample during decoding or an adversarial prompt can often lead to a weak response. That said, the evidence presented in this work demonstrates the potential of tackling the complex issues of alignment with a simple approach.