Large Language Models as Conversational Movie Recommenders: A User Study

Paper · arXiv 2404.19093 · Published April 29, 2024
LLM-Based Recommenders

This paper explores the effectiveness of using large language models (LLMs) for personalized movie recommendations from users’ perspectives in an online field experiment. Our study involves a combination of between-subject prompt and historic consumption assessments, along with within-subject recommendation scenario evaluations. By examining conversation and survey response data from 160 active users, we find that LLMs offer strong recommendation explainability but lack overall personalization, diversity, and user trust. Our results also indicate that different personalized prompting techniques do not significantly affect user-perceived recommendation quality, but the number of movies a user has watched plays a more significant role. Furthermore, LLMs show a greater ability to recommend lesser-known or niche movies. Through qualitative analysis, we identify key conversational patterns linked to positive and negative user interaction experiences and conclude that providing personal context and examples is crucial for obtaining high-quality recommendations from LLMs.

Discussion / Conclusion. Our findings showed that few-shot prompts didn’t significantly improve recommendation quality compared to zero-shot or one-shot approaches. This aligns with Dai et al., who noted that recommendation quality doesn’t always improve with more examples [13]. To address the lack of personalization and novelty in our findings, retrieval-augmented generation (RAG) has been proposed, allowing The success of LLM recommenders cannot be achieved without well-informed users with proper goals and interaction flows. The first learning we summarize from the study is to remind users to limitations could be valuable for future research to enhance understanding in this domain. 6 CONCLUSION