Evidence-centered Assessment for Writing with Generative AI

Paper · arXiv 2401.08964 · Published January 17, 2024

We propose a learning analytics-based methodology for assessing the collaborative writing of humans and generative artificial intelligence. Framed by the evidence-centered design, we used elements of knowledge-telling, knowledge transformation, and cognitive presence to identify assessment claims; we used data collected from the CoAuthor writing tool as potential evidence for these claims; and we used epistemic network analysis to make inferences from the data about the claims. Our findings revealed significant differences in the writing processes of different groups of CoAuthor users, suggesting that our method is a plausible approach to assessing human-AI collaborative writing.

Introduction. Effectively communicating ideas via writing is a critical human skill. Every day, many of us send text messages, draft emails, and make notes; in many specialised domains, such as research, writing is a core form of discourse. The process of writing has, of course, changed over time; writing tools have transformed from mere storage receptacles to tools that help us craft more effective writing, such as word processors, spellcheckers, and grammar checkers [1]. Recently, however, there has been a step-change in tools for writing. Whereas prior tools helped us to save and process our own writing, tools-based on generative artificial intelligence (GAI) can now compose writing for us. This technological advance has already had far reaching implications for society. Arguably, these implications have been—and will continue to be—the most profound for education. Education relies, in part, on assessment. Broadly speaking, we want students to learn skills that will be valuable to their well-being and the well-being of society.

Discussion / Conclusion. In this study, we sought to demonstrate an assessment method for human-AI collaborative writing. Framed by ECD, we used elements of knowledge-telling, knowledge transformation, and cognitive presence to identify claims for our student model; we used data collected from the CoAuthor writing tool as a proxy for our task model; and we used ENA to evidence claims about the student model using data from the task model. More specifically, we compared the co-writing behaviors of users across three conditions: high vs. low ownership; creative vs. argumentative prompt types; and high vs. low temperature. We found statistically significant differences between the process of authors in the high/low ownership conditions and the creative/argumentative conditions. Specifically, authors with GAI ownership over their final product tended to rely more on GAI suggestions, while those with user ownership tended to focus more on composing and revising their own writing. When responding to creative writing prompts, authors tended to explore GAI suggestions more, while those responding to argumentative prompts tended focus on composing their own writing and making small revisions to GAI writing.

Evidence-centered Assessment for Writing with Generative AI

Synthesis notes that discuss concepts related to this paper