"Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline

Paper · arXiv 2406.18512 · Published June 26, 2024
AI in Education

A screenshot of a computer

Explanations form the foundation of knowledge sharing and build upon communication principles, social dynamics, and learning theories. We focus specifically on conversational approaches for explanations because the context is highly adaptive and interactive. Our research leverages previous work on explanatory acts, a framework for understanding the different strategies that explainers and explainees employ in a conversation to both explain, understand, and engage with the other party. We use the 5-Levels dataset was constructed from the WIRED YouTube series by Wachsmuth et al., and later annotated by Booshehri et al. with explanatory acts [3]. These annotations provide a framework for understanding how explainers and explainees structure their response when crafting a response. With the rise of generative AI in the past year, we hope to better understand the capabilities of Large Language Models (LLMs) and how they can augment expert explainer’s capabilities in conversational settings. To achieve this goal, the 5-Levels dataset 1 allows us to audit the ability of LLMs in engaging in explanation dialogues.

Introduction. Explanations are an important part of science communication because they make science more accessible to the general audience. But it can be hard to bridge the knowledge gap between expert explainers and everyday people who have no prerequisite knowledge of the topic. In this research, we focus specifically on explanation conversations where both the explainer and explainee are engaged in a dialogue to help the explainee understand a concept. These explanation conversations are rich for investigation because the flow of these conversations change and adapt depending on the context and background of the explainer and explainee engaged in the conversation [3]. For example, the method that an explainer would take to explain a concept to a 5-year old will be vastly different from how they would explain the concept to a college student. Various factors such as the explainer and explainee’s proficiency and personal interest in the subject area affect how each party will engage in the conversation.

Discussion / Conclusion. This study further demonstrates that more work needs to be done to help experts bridge the knowledge gap between themselves and their audiences. While LLM-generated responses have been shown to perform better than the baseline human responses, these findings cannot be used to advocate for LLMs to replace the function of expert explainers. Instead, this research demonstrates how LLMs are able to augment expert explainer’s capabilities by offering realtime support in tailoring more effective explanation for a given audience. Additionally, based on the qualitative results from the annotators’s responses, one of the main reasons that S2: GPT Standard outperformed S3: GPT w/ EA was due to it’s conciseness, with an average of 10 fewer words per response. This demonstrates that being concise is important in not overwhelming the explainee with information and how carefully planning and segmenting an explanation into manageable chunks is important for information communication and retention.