Exploring Format Consistency for Instruction Tuning
Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, we study how format inconsistency may impact the performance of instruction tuning. We propose a framework called “Unified Instruction Tuning” (UIT), which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets. We show that UIT successfully improves the generalization performance on unseen instructions, which highlights the importance of format consistency for instruction tuning. To make the UIT framework more practical, we further propose a novel perplexity-based denoising method to reduce the noise of automatic format transfer.
Introduction. Recently, instruction tuning has gained considerable attention as a potent strategy for enhancing large language models (LLMs) in following human instructions and generating appropriate responses. For instance, by reformulating various NLP tasks with an instruction template, models trained on the converted dataset exhibit powerful capabilities of zero-shot generalization on unseen tasks (Wei et al., 2021). Later studies have demonstrated that instruction tuning is critical to facilitating LLMs in grounding their inner knowledge to diverse realworld scenarios (Ouyang et al., 2022; Iyer et al., 2022; Chung et al., 2022; Ding et al., 2023). Up to now, considerable efforts have been dedicated to creating datasets for instruction tuning (Honovich et al., 2022a; Bach et al., 2022; Wei et al., 2021; Wang et al., 2022b,a; Aribandi et al., 2022) and researchers find that increasing the task diversity (i.e., the number of unique tasks) of the training data can consistently enhance generalization performance (Wang et al., 2022b; Iyer et al., 2022; Longpre et al., 2023).
Discussion / Conclusion. In this paper, our unified instruction-tuning framework (UIT) provides a standardized approach to enhancing the generalization ability for instruction tuning by unifying the format of existing instruction tuning datasets and enabling format transfer between them. Our approach has been tested on various datasets and has shown improvements in model performance. We also propose a denoising method and an offline model training method to make our UIT more feasible in practice. In general, we study an under-explored facet, i.e., the format consistency, for instruction tuning, and we hope our work could facilitate more attempts in relevant areas. While our proposed UIT framework and format transferer offer a promising approach to enhancing the generalization performance of instruction-tuned LLMs, there are several limitations that should be acknowledged.