GPT-4 as a Homework Tutor can Improve Student Engagement and Learning Outcomes

Paper · arXiv 2409.15981 · Published September 24, 2024
AI in Education

This work contributes to the scarce empirical literature on LLMbased interactive homework in real-world educational settings and offers a practical, scalable solution for improving homework in schools. Homework is an important part of education in schools across the world, but in order to maximize benefit, it needs to be accompanied with feedback and followup questions. We developed a prompting strategy that enables GPT-4 to conduct interactive homework sessions for high-school students learning English as a second language. Our strategy requires minimal efforts in content preparation, one of the key challenges of alternatives like home tutors or ITSs. We carried out a Randomized Controlled Trial (RCT) in four high-school classes, replacing traditional homework with GPT-4 homework sessions for the treatment group. We observed significant improvements in learning outcomes, specifically a greater gain in grammar, and student engagement. In addition, students reported high levels of satisfaction with the system and wanted to continue using it after the end of the RCT.

Introduction. Homework is an important component of education as it helps students self evaluate and develop self regulation skills [38]. However, in order to fully benefit from solving homework problems, it is important that students receive swift feedback on their work [40], which is not possible for teachers due to time constraints. This leads to several students having to resort to private tutors, which can be prohibitively expensive for several households [7]. In this work we look at the possibility of leveraging GPT-4 [30] as a tutor to assist students in their homework using a simple prompting strategy and an interface we developed. In the seminal paper on the 2 Sigma problem, Bloom [4] noted that compared to conventional learning, which consists of lectures followed by evaluation tests, students who received feedback on said tests and were given corrective instruction thereafter performed one standard deviation (σ) higher in terms of learning gains. The same paper notes a 0.8σimprovement for graded homework, but only a 0.3σimprovement if the homework is simply assigned without follow-up.

Discussion / Conclusion. In this work, we run an RCT to evaluate the ability of GPT-4 to function as a tutor. We find that students find this replacement of homework more useful and interesting, and are enthusiastic about continuing using it in their education. In addition to this, we also observe some improvement in learning as measured by tests. Also, we do not find evidence of bias towards stronger students or harmful hallucinations. We further notice that the self-assessments don’t show significant decline over the RCT period, thereby making novelty effects less likely. We do observe some issues with the tutor revealing answers too much, especially when students try to game the system, but the benefits seem to outweigh the drawbacks. The learning gains are also much smaller than the potential 2-sigma improvement, but this can be attributed to the small time scale.