LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain
- URL: http://arxiv.org/abs/2408.10343v1
- Date: Mon, 19 Aug 2024 18:30:18 GMT
- Title: LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain
- Authors: Nicholas Pipitone, Ghita Houir Alami,
- Abstract summary: Retrieval-Augmented Generation (RAG) systems are showing promising potential, and are becoming increasingly relevant in AI-powered legal applications.
Existing benchmarks, such as LegalBench, assess the generative capabilities of Large Language Models (LLMs) in the legal domain.
We introduce LegalBench-RAG, the first benchmark specifically designed to evaluate the retrieval step of RAG pipelines within the legal space.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) systems are showing promising potential, and are becoming increasingly relevant in AI-powered legal applications. Existing benchmarks, such as LegalBench, assess the generative capabilities of Large Language Models (LLMs) in the legal domain, but there is a critical gap in evaluating the retrieval component of RAG systems. To address this, we introduce LegalBench-RAG, the first benchmark specifically designed to evaluate the retrieval step of RAG pipelines within the legal space. LegalBench-RAG emphasizes precise retrieval by focusing on extracting minimal, highly relevant text segments from legal documents. These highly relevant snippets are preferred over retrieving document IDs, or large sequences of imprecise chunks, both of which can exceed context window limitations. Long context windows cost more to process, induce higher latency, and lead LLMs to forget or hallucinate information. Additionally, precise results allow LLMs to generate citations for the end user. The LegalBench-RAG benchmark is constructed by retracing the context used in LegalBench queries back to their original locations within the legal corpus, resulting in a dataset of 6,858 query-answer pairs over a corpus of over 79M characters, entirely human-annotated by legal experts. We also introduce LegalBench-RAG-mini, a lightweight version for rapid iteration and experimentation. By providing a dedicated benchmark for legal retrieval, LegalBench-RAG serves as a critical tool for companies and researchers focused on enhancing the accuracy and performance of RAG systems in the legal domain. The LegalBench-RAG dataset is publicly available at https://github.com/zeroentropy-cc/legalbenchrag.
Related papers
- LRAGE: Legal Retrieval Augmented Generation Evaluation Tool [4.799822253865053]
LRAGE is an open-source tool for holistic evaluation of RAG systems focusing on the legal domain.
We validated LRAGE using multilingual legal benches including Korean (KBL), English (LegalBench), and Chinese (LawBench)
arXiv Detail & Related papers (2025-04-02T15:45:03Z) - LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation [19.633769905100113]
Retrieval-augmented generation (RAG) has proven highly effective in improving large language models (LLMs) across various domains.
There is no benchmark specifically designed to assess the effectiveness of RAG in the legal domain.
We propose LexRAG, the first benchmark to evaluate RAG systems for multi-turn legal consultations.
arXiv Detail & Related papers (2025-02-28T01:46:32Z) - CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation [22.98779736851499]
We introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain.
CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections.
It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results.
arXiv Detail & Related papers (2025-02-25T08:03:32Z) - LegalBench.PT: A Benchmark for Portuguese Law [17.554201334646056]
We present LegalBench.PT, the first comprehensive legal benchmark covering key areas of Portuguese law.
We first collect long-form questions and answers from real law exams, and then use GPT-4o to convert them into multiple-choice, true/false, and matching formats.
arXiv Detail & Related papers (2025-02-22T21:07:12Z) - LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.
LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology [22.740895683854568]
This paper introduces LEGALSEMI, a benchmark specifically curated for legal scenario analysis.
LEGALSEMI comprises 54 legal scenarios, each rigorously annotated by legal experts, based on the comprehensive IRAC (Issue, Rule, Application, Conclusion) framework.
A series of experiments were conducted to assess the usefulness of LEGALSEMI for IRAC analysis.
arXiv Detail & Related papers (2024-06-19T04:59:09Z) - CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering [1.0760413363405308]
Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input.
Case-Based Reasoning (CBR) presents key opportunities to structure retrieval as part of the RAG process in an LLM.
We introduce CBR-RAG, where CBR cycle's initial retrieval stage, its indexing vocabulary, and similarity knowledge containers are used to enhance LLM queries with contextually relevant cases.
arXiv Detail & Related papers (2024-04-04T21:47:43Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - U-CREAT: Unsupervised Case Retrieval using Events extrAcTion [2.2385755093672044]
We propose a new benchmark (in English) for the Prior Case Retrieval task: IL-PCR (Indian Legal Prior Case Retrieval) corpus.
We explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT.
We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin.
arXiv Detail & Related papers (2023-07-11T13:51:12Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural
Networks [3.5880535198436156]
We propose a novel graph-augmented dense statute retriever (G-DSR) model that incorporates the structure of legislation via a graph neural network to improve dense retrieval performance.
Experimental results show that our approach outperforms strong retrieval baselines on a real-world expert-annotated SAR dataset.
arXiv Detail & Related papers (2023-01-30T12:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.