Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments
- URL: http://arxiv.org/abs/2507.04037v2
- Date: Thu, 17 Jul 2025 13:02:05 GMT
- Title: Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments
- Authors: Zheng Jia, Shengbin Yue, Wei Chen, Siyuan Wang, Yidong Liu, Yun Song, Zhongyu Wei,
- Abstract summary: We introduce J1-ENVS, the first interactive and dynamic legal environment tailored for LLM-based agents.<n>It comprises six representative scenarios from Chinese legal practices across three levels of environmental complexity.<n>We also introduce J1-EVAL, a fine-grained evaluation framework, to assess both task performance and procedural compliance.
- Score: 24.249035670782092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The gap between static benchmarks and the dynamic nature of real-world legal practice poses a key barrier to advancing legal intelligence. To this end, we introduce J1-ENVS, the first interactive and dynamic legal environment tailored for LLM-based agents. Guided by legal experts, it comprises six representative scenarios from Chinese legal practices across three levels of environmental complexity. We further introduce J1-EVAL, a fine-grained evaluation framework, designed to assess both task performance and procedural compliance across varying levels of legal proficiency. Extensive experiments on 17 LLM agents reveal that, while many models demonstrate solid legal knowledge, they struggle with procedural execution in dynamic settings. Even the SOTA model, GPT-4o, falls short of 60% overall performance. These findings highlight persistent challenges in achieving dynamic legal intelligence and offer valuable insights to guide future research.
Related papers
- RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models [58.69183479148083]
Legal Judgment Prediction (LJP) is a pivotal task in legal AI.<n>Existing LJP models integrate judicial precedents and legal knowledge for high performance.<n>But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis.<n>This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL)
arXiv Detail & Related papers (2025-05-27T14:50:21Z) - Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond [29.03425022434831]
Test-Time Scaling Large Language Models (LLMs) have demonstrated exceptional capabilities across various domains and tasks, particularly in reasoning.<n>We present a preliminary evaluation of LLMs in various legal scenarios, covering both Chinese and English legal tasks.<n>Our findings indicate that, despite DeepSeek-R1 and OpenAI o1 being among the most powerful models, their legal reasoning capabilities are still lacking.
arXiv Detail & Related papers (2025-03-20T11:14:39Z) - A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences [76.73731245899454]
We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience.<n>Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision.<n>This benchmark paves the way for transparent and accountable AI-assisted law reasoning in the Intelligent Court''
arXiv Detail & Related papers (2025-03-02T10:26:54Z) - Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction [37.856194200684364]
This paper introduces a Multi-agent Legal Simulation Driver (MASER) to scalably generate synthetic data by simulating interactive legal scenarios.<n>MASER ensures the consistency of legal attributes between participants and introduces a supervisory mechanism to align participants' characters and behaviors.
arXiv Detail & Related papers (2025-02-08T15:05:24Z) - LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.<n>LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents [25.509677234774056]
AgentCourt is a comprehensive legal simulation framework that addresses challenges through adversarial evolution of LLM-based agents.<n>By simulating 1,000 civil cases, we construct an evolving knowledge base that enhances the agents' legal reasoning abilities.<n>Our findings emphasize the importance of adversarial learning in legal AI and suggest promising directions for extending simulation-based legal reasoning to broader judicial and regulatory contexts.
arXiv Detail & Related papers (2024-08-15T11:33:20Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation [19.733007669738008]
We propose a novel multi-agent framework, AgentsCourt, for judicial decision-making.
Our framework follows the classic court trial process, consisting of court debate simulation, legal resources retrieval and decision-making refinement.
To support this task, we construct a large-scale legal knowledge base, Legal-KB, with multi-resource legal knowledge.
arXiv Detail & Related papers (2024-03-05T13:30:02Z) - LAiW: A Chinese Legal Large Language Models Benchmark [17.66376880475554]
General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI.
We are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice.
arXiv Detail & Related papers (2023-10-09T11:19:55Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.