Related papers: Multi-Source Retrieval and Reasoning for Legal Sentencing Prediction

Multi-Source Retrieval and Reasoning for Legal Sentencing Prediction

URL: http://arxiv.org/abs/2602.04690v1
Date: Wed, 04 Feb 2026 15:55:55 GMT
Title: Multi-Source Retrieval and Reasoning for Legal Sentencing Prediction
Authors: Junjie Chen, Haitao Li, Qilei Zhang, Zhenghua Li, Ya Zhang, Quan Zhou, Cheng Luo, Yiqun Liu, Dongsheng Guo, Qingyao Ai,
Abstract summary: Legal sentencing prediction (LSP) remains difficult due to its need for fine-grained objective knowledge and flexible subjective reasoning.<n>We propose $MSR2$, a framework that integrates multi-source retrieval and reasoning in LLMs with reinforcement learning.<n>Experiments on two real-world datasets show that $MSR2$ improves both accuracy and interpretability in LSP.
Score: 50.6851250608938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Legal judgment prediction (LJP) aims to predict judicial outcomes from case facts and typically includes law article, charge, and sentencing prediction. While recent methods perform well on the first two subtasks, legal sentencing prediction (LSP) remains difficult due to its need for fine-grained objective knowledge and flexible subjective reasoning. To address these limitations, we propose $MSR^2$, a framework that integrates multi-source retrieval and reasoning in LLMs with reinforcement learning. $MSR^2$ enables LLMs to perform multi-source retrieval based on reasoning needs and applies a process-level reward to guide intermediate subjective reasoning steps. Experiments on two real-world datasets show that $MSR^2$ improves both accuracy and interpretability in LSP, providing a promising step toward practical legal AI. Our code is available at https://anonymous.4open.science/r/MSR2-FC3B.

Related papers

$\ abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z)
Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models [8.769542756426786]
We introduce M SLR, the first Chinese multi-step legal reasoning dataset grounded in real-world judicial decision making.<n>M SLR adopts the IRAC framework (Issue, Rule, Application, Conclusion) to model structured expert reasoning from official legal documents.<n>We design a scalable Human-LLM collaborative annotation pipeline that efficiently produces fine-grained step-level reasoning annotations.<n>Further experiments demonstrate that Self-Initiated Chain-of-Thought prompts generated by models autonomously improve reasoning coherence and quality, outperforming human-designed prompts.
arXiv Detail & Related papers (2025-11-11T08:45:29Z)
Legal$Δ$: Enhancing Legal Reasoning in LLMs via Reinforcement Learning with Chain-of-Thought Guided Information Gain [21.20249684727035]
We propose Legal$Delta$ to enhance legal reasoning through chain-of-thought guided information gain.<n>Legal$Delta$ employs a dual-mode input setup-comprising direct answer and reasoning-augmented modes.<n>It consistently produces more robust and trustworthy legal judgments without relying on labeled preference data.
arXiv Detail & Related papers (2025-08-17T08:10:08Z)
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation [9.282278040339138]
$textbfR2Rec$ is a reasoning-enhanced recommendation framework.<n>It samples interaction chains from the user-item graph and converts them into structured interaction-of-thoughts.
arXiv Detail & Related papers (2025-06-05T14:16:44Z)
RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models [58.69183479148083]
Legal Judgment Prediction (LJP) is a pivotal task in legal AI.<n>Existing LJP models integrate judicial precedents and legal knowledge for high performance.<n>But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis.<n>This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL)
arXiv Detail & Related papers (2025-05-27T14:50:21Z)
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning [60.17074283370798]
Retrieval-Augmented Generation (RAG) integrates external knowledge with Large Language Models (LLMs) to enhance factual correctness and hallucination.<n>We propose $textbfR3-RAG$, which uses $textbfR$einforcement learning to make the LLM learn how to $textbfR$eason and $textbfR$etrieve step by step, thus retrieving comprehensive external knowledge and leading to correct answers.
arXiv Detail & Related papers (2025-05-26T12:25:37Z)
JudgeLRM: Large Reasoning Models as a Judge [80.07261839142548]
We introduce JudgeLRM, a family of judgment-oriented Large Language Models (LLMs)<n>We find a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples, revealing the limits of SFT in such scenarios.<n>We show that JudgeLRM consistently outperform SFT-tuned baselines in the same size, as well as other RL and SFT variants, and even surpass state-of-the-art reasoning models.
arXiv Detail & Related papers (2025-03-31T02:18:51Z)
Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study [9.30538764385435]
Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored.<n>We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations.<n>We then conduct a systematic benchmarking across a range of solutions.<n>Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero.
arXiv Detail & Related papers (2024-12-09T07:46:14Z)
InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain? [18.80574151464707]
Large Language Models (LLMs) can perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $beta$-weighted $textitLegal Safety Score ($LSS_beta$)$, which encapsulates both the fairness and accuracy aspects of the LLM. We propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety.
arXiv Detail & Related papers (2024-02-16T10:54:10Z)
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications. Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.