Related papers: Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study

Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study

URL: http://arxiv.org/abs/2412.06272v1
Date: Mon, 09 Dec 2024 07:46:14 GMT
Title: Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study
Authors: Ehsan Shareghi, Jiuzhou Han, Paul Burgess,
Abstract summary: We focus on the problem of legal citation prediction within the Australian law context, where correctly identifying and citing relevant legislations or precedents is critical.<n>Our findings indicate that domain-specific pre-training alone is insufficient for achieving satisfactory citation accuracy even after law-specialised pre-training.<n>In contrast, instruction tuning on our task-specific dataset dramatically boosts performance reaching the best results across all settings.
Score: 9.30538764385435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, Large Language Models (LLMs) have shown great potential across a wide range of legal tasks. Despite these advances, mitigating hallucination remains a significant challenge, with state-of-the-art LLMs still frequently generating incorrect legal references. In this paper, we focus on the problem of legal citation prediction within the Australian law context, where correctly identifying and citing relevant legislations or precedents is critical. We compare several approaches: prompting general purpose and law-specialised LLMs, retrieval-only pipelines with both generic and domain-specific embeddings, task-specific instruction-tuning of LLMs, and hybrid strategies that combine LLMs with retrieval augmentation, query expansion, or voting ensembles. Our findings indicate that domain-specific pre-training alone is insufficient for achieving satisfactory citation accuracy even after law-specialised pre-training. In contrast, instruction tuning on our task-specific dataset dramatically boosts performance reaching the best results across all settings. We also highlight that database granularity along with the type of embeddings play a critical role in the performance of retrieval systems. Among retrieval-based approaches, hybrid methods consistently outperform retrieval-only setups, and among these, ensemble voting delivers the best result by combining the predictive quality of instruction-tuned LLMs with the retrieval system.

Related papers

LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z)
LegalMALR:Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval [10.997604609194033]
Statute retrieval is essential for legal assistance and judicial decision support.<n>Real-world legal queries are often implicit, multi-issue, and expressed in colloquial or underspecified forms.<n>We present LegalMALR, a retrieval framework that integrates a Multi-Agent Query Understanding System with a zero-shot large-language-generated reranking module.
arXiv Detail & Related papers (2026-01-25T04:44:56Z)
PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice [67.71760070255425]
We introduce PLawBench, a practical benchmark for evaluating large language models (LLMs) in legal practice scenarios.<n>PLawBench comprises 850 questions across 13 practical legal scenarios, with each question accompanied by expert-designed evaluation rubrics.<n>Using an LLM-based evaluator aligned with human expert judgments, we evaluate 10 state-of-the-art LLMs.
arXiv Detail & Related papers (2026-01-23T11:36:10Z)
LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation [6.783926395409993]
This paper introduces a novel, reference-free evaluation methodology that reflects how lawyers evaluate legal answers.<n>We show how our method correlates more closely with human expert evaluations and helps improve inter-annotator agreement.
arXiv Detail & Related papers (2025-10-08T17:10:47Z)
Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM [42.11889345473452]
Legal Article Prediction (LAP) is a critical task in legal text classification.<n>We propose Uni-LAP, a universal framework for legal article prediction.
arXiv Detail & Related papers (2025-09-26T09:42:20Z)
Augmented Question-guided Retrieval (AQgR) of Indian Case Law with LLM, RAG, and Structured Summaries [0.0]
This paper proposes the use of Large Language Models (LLMs) to facilitate the retrieval of relevant cases.<n>Our approach combines Retrieval Augmented Generation (RAG) with structured summaries optimized for Indian case law.<n>The system generates targeted legal questions based on factual scenarios to identify relevant case law more effectively.
arXiv Detail & Related papers (2025-07-23T05:24:44Z)
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation [5.243460995467895]
We present LEGAR BENCH, the first large-scale Korean Legal Case Retrieval benchmark, covering 411 diverse crime types in queries over 1.2M legal cases.<n>We also present LegalSearchLM, a retrieval model that performs legal element reasoning over the query case and directly generates content grounded in the target cases.
arXiv Detail & Related papers (2025-05-28T09:02:41Z)
General-Reasoner: Advancing LLM Reasoning Across All Domains [64.70599911897595]
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains.<n>We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc.
arXiv Detail & Related papers (2025-05-20T17:41:33Z)
NitiBench: A Comprehensive Study of LLM Framework Capabilities for Thai Legal Question Answering [4.61348190872483]
This paper introduces NitiBench, a benchmark comprising two datasets: the NitiBench-CCL, covering general Thai financial law, and the NitiBench-Tax, which includes real-world tax law cases.<n>We evaluate retrieval-augmented generation (RAG) and long-context LLM-based approaches to address three key research questions.
arXiv Detail & Related papers (2025-02-15T17:52:14Z)
LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.<n>LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z)
On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education [1.7977968161686195]
We show that current open-source foundational LLMs possess instruction capability and German legal background knowledge that is sufficient for some legal analysis in an educational context. However, model capability breaks down in very specific tasks, such as the classification of "Gutachtenstil" appraisal style components. We introduce a Retrieval Augmented Generation based prompt example selection method that substantially improves predictions in high data availability scenarios.
arXiv Detail & Related papers (2024-12-20T13:54:57Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval [6.952344923975001]
This work focuses on utilizing the logical reasoning capabilities of large language models (LLMs) to identify relevant legal terms. The proposed retrieval system integrates additional information from the term--based expansion and query reformulation to improve the retrieval accuracy. Experiments on COLIEE 2022 and COLIEE 2023 datasets show that extra knowledge from LLMs helps to improve the retrieval result of both lexical and semantic ranking models.
arXiv Detail & Related papers (2024-10-16T01:34:14Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
The Factuality of Large Language Models in the Legal Domain [8.111302195052641]
This paper investigates the factuality of large language models (LLMs) as knowledge bases in the legal domain. We design a dataset of diverse factual questions about case law and legislation. We then use the dataset to evaluate several LLMs under different evaluation methods, including exact, alias, and fuzzy matching.
arXiv Detail & Related papers (2024-09-18T08:30:20Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
RAG has been widely adopted to enhance Large Language Models (LLMs) Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG. This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step.
arXiv Detail & Related papers (2024-07-01T20:47:47Z)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z)
InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z)
Effective Large Language Model Adaptation for Improved Grounding and Citation Generation [48.07830615309543]
This paper focuses on improving large language models (LLMs) by grounding their responses in retrieved passages and by providing citations. We propose a new framework, AGREE, that improves the grounding from a holistic perspective. Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents.
arXiv Detail & Related papers (2023-11-16T03:22:25Z)
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications. Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.