Using LLMs to Discover Legal Factors

Paper · arXiv 2410.07504 · Published October 10, 2024
Domain Specialization in LLMsArgumentation and PersuasionWorkplace Applications

Abstract. Factors are a foundational component of legal analysis and computational models of legal reasoning. These factor-based representations enable lawyers, judges, and AI and Law researchers to reason about legal cases. In this paper, we introduce a methodology that leverages large language models (LLMs) to discover lists of factors that effectively represent a legal domain. Our method takes as input raw court opinions and produces a set of factors and associated definitions. We demonstrate that a semi-automated approach, incorporating minimal human involvement, produces factor representations that can predict case outcomes with moderate success, if not yet as well as expert-defined factors can.

Introduction. Recently, large language models (LLMs) have been applied automatically to annotate legal case texts from particular legal domains in terms of factors from pre-existing factor lists. In this paper, we describe and assess a methodology for employing LLMs to discover factors in case texts without using a pre-existing factor list. Our method takes as input raw court opinions and produces a set of factors and associated definitions. We evaluate the extent to which an LLM can identify from scratch any factors in the cases from a legal domain where the LLM has no apparent access to a pre-existing list of factors or their definitions for that domain. We demonstrate that a semi-automated approach, with a human in the loop produces factor representations that can predict case outcomes with moderate success, if not yet as well as expert-defined factors can. In the absence of predefined factors from courts or legislative bodies, legal scholars manually analyze hundreds of cases to identify factors, a process that is highly time-consuming and costly.

Discussion / Conclusion. The following diagrams help to explain the difference in predictive performance of factors identified in the Dias CFR and those in the RFR synthesized by the human annotator (Human RFR) or LLMs (Llama RFR or GPT RFR). The diagrams depict semantic relations between these two sets of factors. We embedded the definitions of the factor representations and then measured the cosine similarity between the RFR and CFR factors. For each RFR factor, we calculated the three most similar CFR factors. To filter out low similarity matches, we disregarded any similarity score within the top three for an RFR-CFR factor pair that was lower than the average of all top three scores. The plot in Figure 1 demonstrates the similarity between Human RFR factors on the left and CFR factors on the right. The plot in Figure 2 demonstrates the similarity between Llama RFR factors on the left and CFR factors on the right. If the CFR factor is connected to an “Unmatched Source” it means that no similar RFR factor was identified. If an RFR Factor on the left is connected to “Unmatched Target” it means that the RFR factor was not identified as similar to any CFR factor. The weight of the line indicates the strength of the similarity; heavier lines signify higher cosine similarity.