Better Alignment with Instruction Back-and-Forth Translation
We propose a new method, instruction backand-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al. (2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca- GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation.
Introduction. In recent years, it is increasingly common for large language models (LLMs) to be deployed through a chat interface to interact with users’ queries. This capability is achieved by taking models that have been pre-trained on massive amounts of webcrawled text and fine-tuning them on a relatively smaller set of instruction-response pairs or preferences (Ouyang et al., 2022). Popular instructiontuning corpora are often constructed by (i) human annotation and curation (Köpf et al., 2024; Conover et al., 2023; Zhou et al., 2024), (ii) converting existing texts, e.g. from other NLP tasks (Longpre documents from a large-scale open-source corpus like Dolma (Soldaini et al., 2024) and generate instructions via backtranslation accordingly. We find that the quality of our instructions are comparable to those backtranslated from ClueWeb. To make up for the lack of manually designed rules for structuring the response, we experiment with using an LLM to rewrite the response based on the generated instruction and the initial web text.
Discussion / Conclusion. We propose instruction back-and-forth translation: combining instruction backtranslation method from Li et al. (2023a) with response rewriting, in order to benefit from both the information diversity found on the internet and the quality of model annotations, while enabling scalability owing to the size of the web corpus where we source initial responses from. Future work. Our findings motivate a number of interesting future directions. One concrete question is whether applying other existing curation techniques—e.g. quality filters proposed by Liu et al. (2023)—to our pool of (synthetic instructions, rewritten response) pairs would lead to further performance gains. In addition, we also look forward to scaling up our data generation pipeline Limitations. Although we try to control for confounding factors (e.g. data quantity), our findings are only obtained from using one model family, i.e. Llama-2 (Touvron et al., 2023). Besides, our pipeline revolves around general-purpose English instructions, with limited coding or science-related tasks. Nevertheless, it is possible to extend our method to more domain-specific data, e.g.