L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
- URL: http://arxiv.org/abs/2509.00761v2
- Date: Wed, 03 Sep 2025 00:57:14 GMT
- Title: L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
- Authors: Ziqi Wang, Boqin Yuan,
- Abstract summary: L-MARS (Legal Multi-Agent with Orchestrated Reasoning and Agentic Search) is a system that reduces hallucination and uncertainty in legal question answering.<n>Unlike single-pass retrieval-augmented generation (RAG), L-MARS decomposes queries into subproblems.<n>It employs a Judge Agent to verify sufficiency, jurisdiction, and temporal validity before answer synthesis.
- Score: 3.662162441273026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a system that reduces hallucination and uncertainty in legal question answering through coordinated multi-agent reasoning and retrieval. Unlike single-pass retrieval-augmented generation (RAG), L-MARS decomposes queries into subproblems, issues targeted searches across heterogeneous sources (Serper web, local RAG, CourtListener case law), and employs a Judge Agent to verify sufficiency, jurisdiction, and temporal validity before answer synthesis. This iterative reasoning-search-verification loop maintains coherence, filters noisy evidence, and grounds answers in authoritative law. We evaluated L-MARS on LegalSearchQA, a new benchmark of 200 up-to-date multiple choice legal questions in 2025. Results show that L-MARS substantially improves factual accuracy, reduces uncertainty, and achieves higher preference scores from both human experts and LLM-based judges. Our work demonstrates that multi-agent reasoning with agentic search offers a scalable and reproducible blueprint for deploying LLMs in high-stakes domains requiring precise legal retrieval and deliberation.
Related papers
- LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z) - LegalMALR:Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval [10.997604609194033]
Statute retrieval is essential for legal assistance and judicial decision support.<n>Real-world legal queries are often implicit, multi-issue, and expressed in colloquial or underspecified forms.<n>We present LegalMALR, a retrieval framework that integrates a Multi-Agent Query Understanding System with a zero-shot large-language-generated reranking module.
arXiv Detail & Related papers (2026-01-25T04:44:56Z) - LRAS: Advanced Legal Reasoning with Agentic Search [48.281150948187786]
Legal Reasoning with Agentic Search (LRAS) is a framework designed to transition legal LLMs from static and parametric "closed-loop thinking" to dynamic and interactive "Active Inquiry"<n>By integrating Introspective Learning and Difficulty-aware Reinforcement Learning, LRAS enables LRMs to identify knowledge boundaries and handle legal reasoning.<n> Empirical results demonstrate that LRAS outperforms state-of-the-art baselines by 8.2-32%.
arXiv Detail & Related papers (2026-01-12T08:07:35Z) - AppellateGen: A Benchmark for Appellate Legal Judgment Generation [30.9030336647868]
We introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs.<n>The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates.<n>We propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting.
arXiv Detail & Related papers (2026-01-04T02:15:17Z) - Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics [49.3262123849242]
We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset.<n>We convert court judgments into hierarchical trees of opposing parties' arguments and the court's conclusions, which serve as rubrics for evaluating the issue coverage and correctness of the reasoning traces.
arXiv Detail & Related papers (2025-11-30T18:32:43Z) - Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics [30.232667436008978]
We present a hybrid legal QA agent tailored for judicial settings.<n>It integrates retrieval-augmented generation (RAG) with multi-model ensembling to deliver reliable, auditable, and continuously updatable counsel.
arXiv Detail & Related papers (2025-11-03T15:30:58Z) - A Reasoning-Focused Legal Retrieval Benchmark [28.607778538115642]
We introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA.<n>Our results suggest that legal RAG remains a challenging application, thus motivating future research.
arXiv Detail & Related papers (2025-05-06T20:44:03Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.<n>LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement [85.08223786819532]
Existing large language models (LLMs) show exceptional problem-solving capabilities but might struggle with complex reasoning tasks.<n>We propose textbfRAG-Star, a novel RAG approach that integrates retrieved information to guide the tree-based deliberative reasoning process.<n>Our experiments involving Llama-3.1-8B-Instruct and GPT-4o demonstrate that RAG-Star significantly outperforms previous RAG and reasoning methods.
arXiv Detail & Related papers (2024-12-17T13:05:36Z) - Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study [9.30538764385435]
Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored.<n>We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations.<n>We then conduct a systematic benchmarking across a range of solutions.<n>Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero.
arXiv Detail & Related papers (2024-12-09T07:46:14Z) - Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey [1.0770079992809338]
The capabilities of Large Language Models (LLMs) are increasingly demonstrating unique roles in the legal sector.
This survey delves into the synergy between LLMs and the legal system, such as their applications in tasks like legal text comprehension, case retrieval, and analysis.
The survey showcases the latest advancements in fine-tuned legal LLMs tailored for various legal systems, along with legal datasets available for fine-tuning LLMs in various languages.
arXiv Detail & Related papers (2024-04-01T08:35:56Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.