Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- URL: http://arxiv.org/abs/2405.20362v1
- Date: Thu, 30 May 2024 17:56:05 GMT
- Title: Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- Authors: Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho,
- Abstract summary: We report on the first preregistered empirical evaluation of AI-driven legal research tools.
We find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time.
It provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs.
- Score: 32.78336381381673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as "eliminating" (Casetext, 2023) or "avoid[ing]" hallucinations (Thomson Reuters, 2023), or guaranteeing "hallucination-free" legal citations (LexisNexis, 2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first preregistered empirical evaluation of AI-driven legal research tools. We demonstrate that the providers' claims are overstated. While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. We also document substantial differences between systems in responsiveness and accuracy. Our article makes four key contributions. It is the first to assess and report the performance of RAG-based proprietary legal AI tools. Second, it introduces a comprehensive, preregistered dataset for identifying and understanding vulnerabilities in these systems. Third, it proposes a clear typology for differentiating between hallucinations and accurate legal responses. Last, it provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains a central open question for the responsible integration of AI into law.
Related papers
- Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations [38.30926471814935]
Large Language Models (LLMs) show promise as a writing aid for professionals performing legal analyses.
LLMs can often hallucinate in this setting, in ways difficult to recognize by non-professionals and existing text evaluation metrics.
We introduce the neutral notion of gaps, as opposed to hallucinations in a strict erroneous sense, to refer to the difference between human-written and machine-generated legal analysis.
arXiv Detail & Related papers (2024-09-16T02:38:38Z) - Consent in Crisis: The Rapid Decline of the AI Data Commons [74.68176012363253]
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data.
We conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora.
arXiv Detail & Related papers (2024-07-20T16:50:18Z) - It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human [0.6827423171182154]
Large Language Models (LLMs) enable a future in which certain types of legal documents may be generated automatically.
This study is the necessary analysis of the ongoing transition towards mature generative AI systems.
Our analysis revealed a clear preference for documents perceived as crafted by a human over those believed to be generated by AI.
arXiv Detail & Related papers (2024-07-09T12:11:25Z) - Promises and pitfalls of artificial intelligence for legal applications [19.8511844390731]
We argue that this claim is not supported by the current evidence.
We dive into AI's increasingly prevalent roles in three types of legal tasks.
We make recommendations for better evaluation and deployment of AI in legal contexts.
arXiv Detail & Related papers (2024-01-10T19:50:37Z) - Insights into Classifying and Mitigating LLMs' Hallucinations [48.04565928175536]
This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence.
We explore potential strategies to mitigate hallucinations, aiming to enhance the overall reliability of large language models.
arXiv Detail & Related papers (2023-11-14T12:30:28Z) - Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as
You May Think -- Introducing AI Detectability Index [9.348082057533325]
AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research.
This paper introduces the Counter Turing Test (CT2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the fragility of existing AGTD techniques.
arXiv Detail & Related papers (2023-10-08T06:20:36Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Compliance Challenges in Forensic Image Analysis Under the Artificial
Intelligence Act [8.890638003061605]
We review why the use of machine learning in forensic image analysis is classified as high-risk.
Under the draft AI act, high-risk AI systems for use in law enforcement are permitted but subject to compliance with mandatory requirements.
arXiv Detail & Related papers (2022-03-01T14:03:23Z) - How Does NLP Benefit Legal System: A Summary of Legal Artificial
Intelligence [81.04070052740596]
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain.
This paper introduces the history, the current state, and the future directions of research in LegalAI.
arXiv Detail & Related papers (2020-04-25T14:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.