Related papers: Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

URL: http://arxiv.org/abs/2510.10072v1
Date: Sat, 11 Oct 2025 07:17:22 GMT
Title: Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
Authors: Hua Cai, Shuang Zhao, Liang Zhang, Xuli Shen, Qing Xu, Weilin Shen, Zihao Wen, Tianke Ban,
Abstract summary: We introduce Unilaw-R1, a large language model tailored for legal reasoning.<n>With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost.<n>It tackles three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization.
Score: 15.567885200167913
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.

Related papers

LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z)
PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice [67.71760070255425]
We introduce PLawBench, a practical benchmark for evaluating large language models (LLMs) in legal practice scenarios.<n>PLawBench comprises 850 questions across 13 practical legal scenarios, with each question accompanied by expert-designed evaluation rubrics.<n>Using an LLM-based evaluator aligned with human expert judgments, we evaluate 10 state-of-the-art LLMs.
arXiv Detail & Related papers (2026-01-23T11:36:10Z)
Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering? [7.42457277619017]
We introduce an approach aligning Thai legal question answering systems with improved law citation accuracy and better response quality.<n>Our approach leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward.<n>Experiments on the NitiBench benchmark demonstrate substantial improvements.
arXiv Detail & Related papers (2025-07-13T14:05:48Z)
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark with complex real-world videos requiring balanced perception and reasoning.<n>Using SEED-Bench-R1, we find that standard GRPO, while improving answer accuracy, often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate.<n>We propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision.
arXiv Detail & Related papers (2025-06-19T08:49:13Z)
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation [9.894351313663874]
Legal Case Retrieval (LCR) is a fundamental task for legal professionals.<n>Existing studies on LCR face two major limitations.<n>First, they are evaluated on relatively small-scale retrieval corpora.<n>Second, their reliance on embedding-based or lexical matching methods often results in limited representations and legally irrelevant matches.
arXiv Detail & Related papers (2025-05-28T09:02:41Z)
SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z)
Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond [29.03425022434831]
Test-Time Scaling Large Language Models (LLMs) have demonstrated exceptional capabilities across various domains and tasks, particularly in reasoning.<n>We present a preliminary evaluation of LLMs in various legal scenarios, covering both Chinese and English legal tasks.<n>Our findings indicate that, despite DeepSeek-R1 and OpenAI o1 being among the most powerful models, their legal reasoning capabilities are still lacking.
arXiv Detail & Related papers (2025-03-20T11:14:39Z)
LexPro-1.0 Technical Report [19.83460019437367]
We introduce our first-generation reasoning model, LexPro-1.0, a large language model designed for the highly specialized Chinese legal domain.<n>To address this, we first compile millions of legal documents covering over 20 types of crimes from 31 provinces in China for model training.<n>The model further undergoes large-scale reinforcement learning without additional supervision, emphasizing the enhancement of its reasoning capabilities and explainability.
arXiv Detail & Related papers (2025-03-10T05:54:23Z)
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.<n>We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z)
InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z)
PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation? [3.9018931027384056]
Paramanu-Ayn is a collection of legal language models trained exclusively on Indian legal case documents. Paramanu-Ayn was pretrained from scratch with a context size of 8192 on a single GPU for just 185 hours.
arXiv Detail & Related papers (2024-03-20T15:39:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.