Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
- URL: http://arxiv.org/abs/2510.10072v1
- Date: Sat, 11 Oct 2025 07:17:22 GMT
- Title: Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
- Authors: Hua Cai, Shuang Zhao, Liang Zhang, Xuli Shen, Qing Xu, Weilin Shen, Zihao Wen, Tianke Ban,
- Abstract summary: We introduce Unilaw-R1, a large language model tailored for legal reasoning.<n>With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost.<n>It tackles three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization.
- Score: 15.567885200167913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.
Related papers
- LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z) - PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice [67.71760070255425]
We introduce PLawBench, a practical benchmark for evaluating large language models (LLMs) in legal practice scenarios.<n>PLawBench comprises 850 questions across 13 practical legal scenarios, with each question accompanied by expert-designed evaluation rubrics.<n>Using an LLM-based evaluator aligned with human expert judgments, we evaluate 10 state-of-the-art LLMs.
arXiv Detail & Related papers (2026-01-23T11:36:10Z) - Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering? [7.42457277619017]
We introduce an approach aligning Thai legal question answering systems with improved law citation accuracy and better response quality.<n>Our approach leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward.<n>Experiments on the NitiBench benchmark demonstrate substantial improvements.
arXiv Detail & Related papers (2025-07-13T14:05:48Z) - GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark with complex real-world videos requiring balanced perception and reasoning.<n>Using SEED-Bench-R1, we find that standard GRPO, while improving answer accuracy, often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate.<n>We propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision.
arXiv Detail & Related papers (2025-06-19T08:49:13Z) - LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation [9.894351313663874]
Legal Case Retrieval (LCR) is a fundamental task for legal professionals.<n>Existing studies on LCR face two major limitations.<n>First, they are evaluated on relatively small-scale retrieval corpora.<n>Second, their reliance on embedding-based or lexical matching methods often results in limited representations and legally irrelevant matches.
arXiv Detail & Related papers (2025-05-28T09:02:41Z) - SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond [29.03425022434831]
Test-Time Scaling Large Language Models (LLMs) have demonstrated exceptional capabilities across various domains and tasks, particularly in reasoning.<n>We present a preliminary evaluation of LLMs in various legal scenarios, covering both Chinese and English legal tasks.<n>Our findings indicate that, despite DeepSeek-R1 and OpenAI o1 being among the most powerful models, their legal reasoning capabilities are still lacking.
arXiv Detail & Related papers (2025-03-20T11:14:39Z) - LexPro-1.0 Technical Report [19.83460019437367]
We introduce our first-generation reasoning model, LexPro-1.0, a large language model designed for the highly specialized Chinese legal domain.<n>To address this, we first compile millions of legal documents covering over 20 types of crimes from 31 provinces in China for model training.<n>The model further undergoes large-scale reinforcement learning without additional supervision, emphasizing the enhancement of its reasoning capabilities and explainability.
arXiv Detail & Related papers (2025-03-10T05:54:23Z) - T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.<n>We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation? [3.9018931027384056]
Paramanu-Ayn is a collection of legal language models trained exclusively on Indian legal case documents.
Paramanu-Ayn was pretrained from scratch with a context size of 8192 on a single GPU for just 185 hours.
arXiv Detail & Related papers (2024-03-20T15:39:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.