Answer is All You Need: Instruction-following Text Embedding via Answering the Question
This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose INBEDDER that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. INBEDDER demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.
Introduction. Text embedders play a crucial role in large-scale textual data analysis and management. While existing models (Reimers and Gurevych, 2019a; Gao et al., 2021; Ni et al., 2022a,b; Wang et al., 2022; Xiao et al., 2023) demonstrate strong effectiveness in representing texts in general, they lack the ability to address user-specific objectives. This limitation hinders their application in more intricate scenarios where the embedding task requires the model to represent particular characteristics of the texts (Wang et al., 2023; Zhang et al., 2023b). Consider Figure 1, where a single set of reviews is required to be clustered in three distinct manners to derive meaningful insights. In response, we attempt to equip the text embedders with instruction-following capability in this paper. One straightforward solution is to embed the concatenated instruction and input. Nonetheless, generic textual embeddings represent the texts in a form that can be used in textual similarity tasks, search and clustering, etc, rather than following instructions.
Discussion / Conclusion. Our work addresses a novel problem, text embedding with instruction-following. We propose INBEDDER to produce desirable embeddings from LLMs via generating expected answers. The method is inspired by observations on existing LLMs. Our text embedder model llama-2-7b- INBEDDER outperforms both traditional sentence transformers and aggregated embeddings from LLMs on instruction-awareness tests, and instruction robustness tests and achieves close performance on traditional generic tasks. We also show that INBEDDER inherently is applicable for embedding cluster explanation which will significantly facilitate user-oriented dataset analysis. We encourage future works to investigate more efficient solutions which is important in large-scale retrieval systems.