LegalOne: A Family of Foundation Models for Reliable Legal Reasoning
- URL: http://arxiv.org/abs/2602.00642v2
- Date: Tue, 03 Feb 2026 14:54:35 GMT
- Title: LegalOne: A Family of Foundation Models for Reliable Legal Reasoning
- Authors: Haitao Li, Yifan Chen, Shuo Miao, Qian Dong, Jia Chen, Yiran Hu, Junjie Chen, Minghao Qin, Yueyue Wu, Yujia Zhou, Qingyao Ai, Yiqun Liu, Cheng Luo, Quan Zhou, Ya Zhang, Jikun Hu,
- Abstract summary: We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
- Score: 54.57434222018289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Large Language Models (LLMs) have demonstrated impressive general capabilities, their direct application in the legal domain is often hindered by a lack of precise domain knowledge and complexity of performing rigorous multi-step judicial reasoning. To address this gap, we present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain. LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning. First, during mid-training phase, we propose Plasticity-Adjusted Sampling (PAS) to address the challenge of domain adaptation. This perplexity-based scheduler strikes a balance between the acquisition of new knowledge and the retention of original capabilities, effectively establishing a robust legal foundation. Second, during supervised fine-tuning, we employ Legal Agentic CoT Distillation (LEAD) to distill explicit reasoning from raw legal texts. Unlike naive distillation, LEAD utilizes an agentic workflow to convert complex judicial processes into structured reasoning trajectories, thereby enforcing factual grounding and logical rigor. Finally, we implement a Curriculum Reinforcement Learning (RL) strategy. Through a progressive reinforcement process spanning memorization, understanding, and reasoning, LegalOne evolves from simple pattern matching to autonomous and reliable legal reasoning. Experimental results demonstrate that LegalOne achieves state-of-the-art performance across a wide range of legal tasks, surpassing general-purpose LLMs with vastly larger parameter counts through enhanced knowledge density and efficiency. We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI, paving the way for deploying trustworthy and interpretable foundation models in high-stakes judicial applications.
Related papers
- PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice [67.71760070255425]
We introduce PLawBench, a practical benchmark for evaluating large language models (LLMs) in legal practice scenarios.<n>PLawBench comprises 850 questions across 13 practical legal scenarios, with each question accompanied by expert-designed evaluation rubrics.<n>Using an LLM-based evaluator aligned with human expert judgments, we evaluate 10 state-of-the-art LLMs.
arXiv Detail & Related papers (2026-01-23T11:36:10Z) - ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India [10.522785783474857]
We introduce Reinforcement Learning-based Legal Reasoning (ReGal), a framework that integrates Multi-Task Instruction Tuning with Reinforcement Learning from AI Feedback.<n>Our approach is evaluated across two critical legal tasks: (i) Court Judgment Prediction and Explanation (CJPE), and (ii) Legal Document Summarization.
arXiv Detail & Related papers (2025-12-19T19:13:41Z) - Judicial Requirements for Generative AI in Legal Reasoning [0.0]
Large Language Models (LLMs) are being integrated into professional domains, yet their limitations in high-stakes fields like law remain poorly understood.<n>This paper defines the core capabilities that an AI system must possess to function as a reliable reasoning tool in judicial decision-making.
arXiv Detail & Related papers (2025-08-26T09:56:26Z) - ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation [56.79698529022327]
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution.<n>This paper explores the problem of legal claim generation based on the given case's facts.<n>We construct ClaimGen-CN, the first dataset for Chinese legal claim generation task.
arXiv Detail & Related papers (2025-08-24T07:19:25Z) - GLARE: Agentic Reasoning for Legal Judgment Prediction [60.13483016810707]
Legal judgment prediction (LJP) has become increasingly important in the legal field.<n>Existing large language models (LLMs) have significant problems of insufficient reasoning due to a lack of legal knowledge.<n>We introduce GLARE, an agentic legal reasoning framework that dynamically acquires key legal knowledge by invoking different modules.
arXiv Detail & Related papers (2025-08-22T13:38:12Z) - RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models [58.69183479148083]
Legal Judgment Prediction (LJP) is a pivotal task in legal AI.<n>Existing LJP models integrate judicial precedents and legal knowledge for high performance.<n>But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis.<n>This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL)
arXiv Detail & Related papers (2025-05-27T14:50:21Z) - Continual Pre-Training is (not) What You Need in Domain Adaption [5.64352123192466]
This paper examines the efficacy of Domain-Adaptive Continual Pre-Training (DACP) in improving the legal reasoning capabilities of Legal Large Language Models (LLMs)<n>We demonstrate that while DACP enhances domain-specific knowledge, it does not uniformly improve performance across all legal tasks.<n>We discuss the trade-offs involved in DACP, particularly its impact on model generalization and performance in prompt-based tasks, and propose directions for future research to optimize domain adaptation strategies in legal AI.
arXiv Detail & Related papers (2025-04-18T10:14:51Z) - Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models [13.067312163677933]
We propose a novel approach integrating Large Language Models with specially designed prompts to address precision requirements in legal Artificial Intelligence (LegalAI) applications.
To validate this method, we introduce a curated dataset tailored to precision-oriented LegalAI tasks.
arXiv Detail & Related papers (2024-07-26T18:46:39Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.