MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning
- URL: http://arxiv.org/abs/2509.24922v2
- Date: Tue, 30 Sep 2025 17:09:29 GMT
- Title: MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning
- Authors: Huihao Jing, Wenbin Hu, Hongyu Luo, Jianhui Yang, Wei Fan, Haoran Li, Yangqiu Song,
- Abstract summary: Multi-agent systems (MAS), leveraging the remarkable capabilities of Large Language Models (LLMs), show great potential in addressing complex tasks.<n>Previous studies have developed legal benchmarks for LLM agents, but none are specifically designed to consider the unique advantages of MAS.<n>We propose MASLegalBench, a legal benchmark tailored for MAS and designed with a deductive reasoning approach.
- Score: 45.37095206528033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent systems (MAS), leveraging the remarkable capabilities of Large Language Models (LLMs), show great potential in addressing complex tasks. In this context, integrating MAS with legal tasks is a crucial step. While previous studies have developed legal benchmarks for LLM agents, none are specifically designed to consider the unique advantages of MAS, such as task decomposition, agent specialization, and flexible training. In fact, the lack of evaluation methods limits the potential of MAS in the legal domain. To address this gap, we propose MASLegalBench, a legal benchmark tailored for MAS and designed with a deductive reasoning approach. Our benchmark uses GDPR as the application scenario, encompassing extensive background knowledge and covering complex reasoning processes that effectively reflect the intricacies of real-world legal situations. Furthermore, we manually design various role-based MAS and conduct extensive experiments using different state-of-the-art LLMs. Our results highlight the strengths, limitations, and potential areas for improvement of existing models and MAS architectures.
Related papers
- MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems [59.20800753428596]
We present MAS-ProVe, a systematic empirical study of process verification for multi-agent systems (MAS)<n>Our study spans three verification paradigms (LLM-as-a-Judge, reward models, and process reward models)<n>We find that process-level verification does not consistently improve performance and frequently exhibits high variance.
arXiv Detail & Related papers (2026-02-03T03:30:36Z) - LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z) - LLM Agents in Law: Taxonomy, Applications, and Challenges [24.660146939399567]
Large language models (LLMs) have precipitated a dramatic improvement in the legal domain.<n>The deployment of standalone models faces significant limitations regarding hallucination, outdated information, and verifiability.<n>Recently, LLM agents have attracted significant attention as a solution to these challenges.
arXiv Detail & Related papers (2026-01-08T21:04:35Z) - On the Importance of Task Complexity in Evaluating LLM-Based Multi-Agent Systems [14.75237035960069]
Large language model multi-agent systems (LLM-MAS) offer a promising paradigm for harnessing collective intelligence to achieve more advanced forms of AI behaviour.<n>We argue that a principled understanding of task complexity, such as the degree of sequential reasoning required and the breadth of capabilities involved, is essential for assessing the effectiveness of LLM-MAS in task solving.
arXiv Detail & Related papers (2025-10-05T18:08:48Z) - General-Reasoner: Advancing LLM Reasoning Across All Domains [64.70599911897595]
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains.<n>We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc.
arXiv Detail & Related papers (2025-05-20T17:41:33Z) - On Path to Multimodal Generalist: General-Level and General-Bench [153.9720740167528]
This project introduces General-Level, an evaluation framework that defines 5-scale levels of MLLM performance and generality.<n>At the core of the framework is the concept of Synergy, which measures whether models maintain consistent capabilities across comprehension and generation.<n>The evaluation results that involve over 100 existing state-of-the-art MLLMs uncover the capability rankings of generalists.
arXiv Detail & Related papers (2025-05-07T17:59:32Z) - LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.<n>LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration [27.047809869136458]
Large Language Models (LLMs) could struggle to fully understand legal theories and perform legal reasoning tasks.
We introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities.
We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability.
arXiv Detail & Related papers (2024-10-03T14:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.