LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
- URL: http://arxiv.org/abs/2504.01840v2
- Date: Fri, 25 Apr 2025 01:57:31 GMT
- Title: LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
- Authors: Minhu Park, Hongseok Oh, Eunkyung Choi, Wonseok Hwang,
- Abstract summary: LRAGE is an open-source tool for holistic evaluation of RAG systems focusing on the legal domain.<n>We validated LRAGE using multilingual legal benches including Korean (KBL), English (LegalBench), and Chinese (LawBench)
- Score: 4.799822253865053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, building retrieval-augmented generation (RAG) systems to enhance the capability of large language models (LLMs) has become a common practice. Especially in the legal domain, previous judicial decisions play a significant role under the doctrine of stare decisis which emphasizes the importance of making decisions based on (retrieved) prior documents. However, the overall performance of RAG system depends on many components: (1) retrieval corpora, (2) retrieval algorithms, (3) rerankers, (4) LLM backbones, and (5) evaluation metrics. Here we propose LRAGE, an open-source tool for holistic evaluation of RAG systems focusing on the legal domain. LRAGE provides GUI and CLI interfaces to facilitate seamless experiments and investigate how changes in the aforementioned five components affect the overall accuracy. We validated LRAGE using multilingual legal benches including Korean (KBL), English (LegalBench), and Chinese (LawBench) by demonstrating how the overall accuracy changes when varying the five components mentioned above. The source code is available at https://github.com/hoorangyee/LRAGE.
Related papers
- LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval [7.059964549363294]
We develop an efficient bilingual question-answering framework for regulatory documents, specifically the Bangladesh Police Gazettes.
Our approach employs modern Retrieval Augmented Generation (RAG) pipelines to enhance information retrieval and response generation.
This system enables efficient searching for specific government legal notices, making legal information more accessible.
arXiv Detail & Related papers (2025-04-19T06:09:54Z) - JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System [12.256518096712334]
JuDGE (Judgment Document Generation Evaluation) is a novel benchmark for evaluating the performance of judgment document generation in the Chinese legal system.
We construct a comprehensive dataset consisting of factual descriptions from real legal cases, paired with their corresponding full judgment documents.
In collaboration with legal professionals, we establish a comprehensive automated evaluation framework to assess the quality of generated judgment documents.
arXiv Detail & Related papers (2025-03-18T13:48:18Z) - LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation [19.633769905100113]
Retrieval-augmented generation (RAG) has proven highly effective in improving large language models (LLMs) across various domains.<n>There is no benchmark specifically designed to assess the effectiveness of RAG in the legal domain.<n>We propose LexRAG, the first benchmark to evaluate RAG systems for multi-turn legal consultations.
arXiv Detail & Related papers (2025-02-28T01:46:32Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation [60.04380907045708]
Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem.
We propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval.
MemoRAG achieves superior performances across a variety of long-context evaluation tasks.
arXiv Detail & Related papers (2024-09-09T13:20:31Z) - LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain [0.0]
Retrieval-Augmented Generation (RAG) systems are showing promising potential, and are becoming increasingly relevant in AI-powered legal applications.
Existing benchmarks, such as LegalBench, assess the generative capabilities of Large Language Models (LLMs) in the legal domain.
We introduce LegalBench-RAG, the first benchmark specifically designed to evaluate the retrieval step of RAG pipelines within the legal space.
arXiv Detail & Related papers (2024-08-19T18:30:18Z) - Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting [68.90949377014742]
Speculative RAG is a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM.
Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts.
It notably enhances accuracy by up to 12.97% while reducing latency by 50.83% compared to conventional RAG systems on PubHealth.
arXiv Detail & Related papers (2024-07-11T06:50:19Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - Benchmarking Large Language Models in Retrieval-Augmented Generation [53.504471079548]
We systematically investigate the impact of Retrieval-Augmented Generation on large language models.
We analyze the performance of different large language models in 4 fundamental abilities required for RAG.
We establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese.
arXiv Detail & Related papers (2023-09-04T08:28:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.