Related papers: LLMs for LLMs: A Structured Prompting Methodology for Long Legal Documents

LLMs for LLMs: A Structured Prompting Methodology for Long Legal Documents

URL: http://arxiv.org/abs/2509.02241v1
Date: Tue, 02 Sep 2025 12:09:49 GMT
Title: LLMs for LLMs: A Structured Prompting Methodology for Long Legal Documents
Authors: Strahinja Klem, Noura Al Moubayed,
Abstract summary: We present a structured prompting methodology as a viable alternative to the often expensive fine-tuning.<n>We tack long legal documents from the CUAD dataset on the task of information retrieval.<n>We tackle the resulting candidate selection problem with the introduction of the Distribution-based Localisation and Inverse Cardinality Weightings.
Score: 3.887688898850802
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of Large Language Models (LLMs) has had a profoundly transformative effect on a number of fields and domains. However, their uptake in Law has proven more challenging due to the important issues of reliability and transparency. In this study, we present a structured prompting methodology as a viable alternative to the often expensive fine-tuning, with the capability of tacking long legal documents from the CUAD dataset on the task of information retrieval. Each document is first split into chunks via a system of chunking and augmentation, addressing the long document problem. Then, alongside an engineered prompt, the input is fed into QWEN-2 to produce a set of answers for each question. Finally, we tackle the resulting candidate selection problem with the introduction of the Distribution-based Localisation and Inverse Cardinality Weighting heuristics. This approach leverages a general purpose model to promote long term scalability, prompt engineering to increase reliability and the two heuristic strategies to reduce the impact of the black box effect. Whilst our model performs up to 9\% better than the previously presented method, reaching state-of-the-art performance, it also highlights the limiting factor of current automatic evaluation metrics for question answering, serving as a call to action for future research. However, the chief aim of this work is to underscore the potential of structured prompt engineering as a useful, yet under-explored, tool in ensuring accountability and responsibility of AI in the legal domain, and beyond.

Related papers

Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding [49.26132236798123]
Vision Language Models (VLMs) have gradually become a primary approach in document understanding.<n>We propose SLEUTH, a multi agent framework that orchestrates a retriever and four collaborative agents in a coarse to fine process.<n>The framework identifies key textual and visual clues within the retrieved pages, filters for salient visual evidence such as tables and charts, and analyzes the query to devise a reasoning strategy.
arXiv Detail & Related papers (2025-11-28T03:09:40Z)
URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding [55.45331924836242]
We present URaG, a framework that Unifies Retrieval and Generation within a single MLLM.<n>We show that URaG achieves state-of-the-art performance while reducing computational overhead by 44-56%.
arXiv Detail & Related papers (2025-11-13T17:54:09Z)
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval [44.680580989270965]
Retro* is a novel approach for reasoning-intensive document retrieval.<n>We introduce a rubric-based relevance scoring mechanism, enabling the model to reason about the relationship between a task and a document.<n>Our experiments show that Retro* outperforms existing document retrieval methods with notable advantages.
arXiv Detail & Related papers (2025-09-29T14:53:05Z)
MDBench: A Synthetic Multi-Document Reasoning Benchmark Generated with Knowledge Guidance [5.192956837901584]
We introduce MDBench, a new dataset for evaluating large language mod-els (LLMs) on the task of multi-document reasoning.<n>We use a novel synthetic generation process, allowing us to controllably and efficiently generate challenging document sets.<n>We analyze the behavior of popular LLMs and prompting techniques, finding that MDBENCH poses significant challenges for all methods.
arXiv Detail & Related papers (2025-06-17T19:14:30Z)
AutoRev: Automatic Peer Review System for Academic Research Papers [9.269282930029856]
AutoRev is an Automatic Peer Review System for Academic Research Papers.<n>Our framework represents an academic document as a graph, enabling the extraction of the most critical passages.<n>When applied to review generation, our method outperforms SOTA baselines by an average of 58.72%.
arXiv Detail & Related papers (2025-05-20T13:59:58Z)
Efficient Inference for Large Reasoning Models: A Survey [74.17203483365171]
Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason.<n>However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time.<n>This survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality.
arXiv Detail & Related papers (2025-03-29T13:27:46Z)
Vietnamese Legal Information Retrieval in Question-Answering System [0.0]
Retrieval Augmented Generation (RAG) has gained significant recognition for enhancing the capabilities of large language models (LLMs) However, RAG often fall short when applied to the Vietnamese language due to several challenges. This report introduces our three main modifications taken to address these challenges.
arXiv Detail & Related papers (2024-09-05T02:34:05Z)
RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem. We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt. We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z)
Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation [5.646219481667151]
This paper introduces a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. We introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. We also present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system.
arXiv Detail & Related papers (2024-05-17T11:22:27Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.