AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Paper · arXiv 2010.15980 · Published October 29, 2020
Prompts and PromptingSentiment, Semantics, and Toxicity DetectionNatural Language InferenceKnowledge Graphs

The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fillin-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AUTOPROMPT, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AUTO- PROMPT, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.

Introduction. Pretrained language models (LMs) have had exceptional success when adapted to downstream tasks via finetuning (Peters et al., 2018; Devlin et al., 2019). Although it is clear that pretraining improves accuracy, it is difficult to determine whether the knowledge that finetuned LMs contain is learned during the pretraining or the finetuning

Discussion / Conclusion. Prompting as an Alternative to Finetuning The goal of prompting a language model is to probe the knowledge that the model acquired from pretraining. Nevertheless, prompting has some practical advantages over finetuning for solving realworld tasks. First, as shown in Section 3, prompts generated using AUTOPROMPT can achieve higher accuracy than finetuning in the low-data regime. Moreover, prompting has advantages over finetuning when trying to solve many different tasks (e.g., the many users of the OpenAI GPT-3 API (Brown et al., 2020)). In particular, finetuning requires storing large language model checkpoints for each individual task, and drastically increases system cost and complexity because it requires deploying many different models at the same time. Prompting alleviates both of these issues. Only prompts are stored for each individual task, while the same pretrained model is used across all of the tasks. Limitations of Prompting There are certain phenomena that are difficult to elicit from pretrained language models via prompts. In our preliminary evaluation on datasets such as QQP (Iyer et al., 2017) and RTE (Dagan et al., 2005), prompts generated manually and with AUTOPROMPT did not perform considerably better than chance.