Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- URL: http://arxiv.org/abs/2405.20362v1
- Date: Thu, 30 May 2024 17:56:05 GMT
- Title: Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- Authors: Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho,
- Abstract summary: We report on the first preregistered empirical evaluation of AI-driven legal research tools.
We find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time.
It provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs.
- Score: 32.78336381381673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as "eliminating" (Casetext, 2023) or "avoid[ing]" hallucinations (Thomson Reuters, 2023), or guaranteeing "hallucination-free" legal citations (LexisNexis, 2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first preregistered empirical evaluation of AI-driven legal research tools. We demonstrate that the providers' claims are overstated. While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. We also document substantial differences between systems in responsiveness and accuracy. Our article makes four key contributions. It is the first to assess and report the performance of RAG-based proprietary legal AI tools. Second, it introduces a comprehensive, preregistered dataset for identifying and understanding vulnerabilities in these systems. Third, it proposes a clear typology for differentiating between hallucinations and accurate legal responses. Last, it provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains a central open question for the responsible integration of AI into law.
Related papers
- Could AI Trace and Explain the Origins of AI-Generated Images and Text? [53.11173194293537]
AI-generated content is increasingly prevalent in the real world.
adversaries might exploit large multimodal models to create images that violate ethical or legal standards.
Paper reviewers may misuse large language models to generate reviews without genuine intellectual effort.
arXiv Detail & Related papers (2025-04-05T20:51:54Z) - Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification [4.099848175176399]
The application of AI tools to the legal field feels natural.
However, legal documents differ from the web-based text that underlies most AI systems.
We identify three areas of special relevance to practitioners: data curation, data annotation, and output verification.
arXiv Detail & Related papers (2025-04-02T04:34:58Z) - Guarding against artificial intelligence--hallucinated citations: the case for full-text reference deposit [0.0]
Journals should require authors to submit the full text of each cited source along with their manuscripts.
This solution requires limited additional work on the part of authors or editors while effectively immunizing journals against hallucinated references.
arXiv Detail & Related papers (2025-03-25T17:12:38Z) - AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases.
Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models.
Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z) - Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
Misclassification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content.
We systematically evaluate eleven state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.
Our findings reveal that detectors frequently misclassify even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - Retrieval-augmented systems can be dangerous medical communicators [21.371504193281226]
Patients have long sought health information online, and increasingly, they are turning to generative AI to answer their health-related queries.
Retrieval-augmented generation and citation grounding have been widely promoted as methods to reduce hallucinations and improve the accuracy of AI-generated responses.
This paper argues that even when these methods produce literally accurate content drawn from source documents sans hallucinations, they can still be highly misleading.
arXiv Detail & Related papers (2025-02-18T01:57:02Z) - Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations [38.30926471814935]
Large Language Models (LLMs) show promise as a writing aid for professionals performing legal analyses.
LLMs can often hallucinate in this setting, in ways difficult to recognize by non-professionals and existing text evaluation metrics.
We introduce the neutral notion of gaps, as opposed to hallucinations in a strict erroneous sense, to refer to the difference between human-written and machine-generated legal analysis.
arXiv Detail & Related papers (2024-09-16T02:38:38Z) - Consent in Crisis: The Rapid Decline of the AI Data Commons [74.68176012363253]
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data.
We conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora.
arXiv Detail & Related papers (2024-07-20T16:50:18Z) - It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human [0.6827423171182154]
Large Language Models (LLMs) enable a future in which certain types of legal documents may be generated automatically.
This study is the necessary analysis of the ongoing transition towards mature generative AI systems.
Our analysis revealed a clear preference for documents perceived as crafted by a human over those believed to be generated by AI.
arXiv Detail & Related papers (2024-07-09T12:11:25Z) - Promises and pitfalls of artificial intelligence for legal applications [19.8511844390731]
We argue that this claim is not supported by the current evidence.
We dive into AI's increasingly prevalent roles in three types of legal tasks.
We make recommendations for better evaluation and deployment of AI in legal contexts.
arXiv Detail & Related papers (2024-01-10T19:50:37Z) - Insights into Classifying and Mitigating LLMs' Hallucinations [48.04565928175536]
This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence.
We explore potential strategies to mitigate hallucinations, aiming to enhance the overall reliability of large language models.
arXiv Detail & Related papers (2023-11-14T12:30:28Z) - Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as
You May Think -- Introducing AI Detectability Index [9.348082057533325]
AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research.
This paper introduces the Counter Turing Test (CT2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the fragility of existing AGTD techniques.
arXiv Detail & Related papers (2023-10-08T06:20:36Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Compliance Challenges in Forensic Image Analysis Under the Artificial
Intelligence Act [8.890638003061605]
We review why the use of machine learning in forensic image analysis is classified as high-risk.
Under the draft AI act, high-risk AI systems for use in law enforcement are permitted but subject to compliance with mandatory requirements.
arXiv Detail & Related papers (2022-03-01T14:03:23Z) - How Does NLP Benefit Legal System: A Summary of Legal Artificial
Intelligence [81.04070052740596]
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain.
This paper introduces the history, the current state, and the future directions of research in LegalAI.
arXiv Detail & Related papers (2020-04-25T14:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.