Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models
- URL: http://arxiv.org/abs/2403.18093v1
- Date: Tue, 26 Mar 2024 20:25:53 GMT
- Title: Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models
- Authors: Hai-Long Nguyen, Duc-Minh Nguyen, Tan-Minh Nguyen, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, Ken Satoh,
- Abstract summary: This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system.
Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy.
However, error analysis reveals several existing issues in the retrieval system that still need resolution.
- Score: 7.299483088092052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models with billions of parameters, such as GPT-3.5, GPT-4, and LLaMA, are increasingly prevalent. Numerous studies have explored effective prompting techniques to harness the power of these LLMs for various research problems. Retrieval, specifically in the legal data domain, poses a challenging task for the direct application of Prompting techniques due to the large number and substantial length of legal articles. This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system, preceded by the support of two phases: BM25 Pre-ranking and BERT-based Re-ranking. Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy. However, error analysis reveals several existing issues in the retrieval system that still need resolution.
Related papers
- Ontology Matching with Large Language Models and Prioritized Depth-First Search [0.2454454561635539]
We introduce MILA, a novel approach that embeds a retrieve-identify-prompt pipeline within a prioritized depth-first search (PDFS) strategy.
This approach efficiently identifies a large number of semantic correspondences with high accuracy, limiting LLM requests to only the most borderline cases.
Our method achieved the highest F-Measure in four of the five unsupervised tasks, outperforming state-of-the-art OM systems by up to 17%.
arXiv Detail & Related papers (2025-01-20T12:29:09Z) - Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study [9.30538764385435]
We focus on the problem of legal citation prediction within the Australian law context, where correctly identifying and citing relevant legislations or precedents is critical.
Our findings indicate that domain-specific pre-training alone is insufficient for achieving satisfactory citation accuracy even after law-specialised pre-training.
In contrast, instruction tuning on our task-specific dataset dramatically boosts performance reaching the best results across all settings.
arXiv Detail & Related papers (2024-12-09T07:46:14Z) - LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant [63.28378110792787]
We introduce LamRA, a versatile framework designed to empower Large Multimodal Models with sophisticated retrieval and reranking capabilities.
For retrieval, we adopt a two-stage training strategy comprising language-only pre-training and multimodal instruction tuning.
For reranking, we employ joint training for both pointwise and listwise reranking, offering two distinct ways to further boost the retrieval performance.
arXiv Detail & Related papers (2024-12-02T17:10:16Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval [6.952344923975001]
This work focuses on utilizing the logical reasoning capabilities of large language models (LLMs) to identify relevant legal terms.
The proposed retrieval system integrates additional information from the term--based expansion and query reformulation to improve the retrieval accuracy.
Experiments on COLIEE 2022 and COLIEE 2023 datasets show that extra knowledge from LLMs helps to improve the retrieval result of both lexical and semantic ranking models.
arXiv Detail & Related papers (2024-10-16T01:34:14Z) - Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment [16.39696580487218]
Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval.
Recent research has explored using large language models (LLMs) as retrievers, achieving SOTA performance across various tasks.
arXiv Detail & Related papers (2024-08-22T08:16:07Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead [12.324949480085424]
There is currently no existing survey that focuses on the utilization of Large Language Models for vulnerability detection and repair.
This paper offers a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs.
arXiv Detail & Related papers (2024-04-03T07:27:33Z) - Reliable, Adaptable, and Attributable Language Models with Retrieval [144.26890121729514]
Parametric language models (LMs) are trained on vast amounts of web data.
They face practical challenges such as hallucinations, difficulty in adapting to new data distributions, and a lack of verifiability.
We advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs.
arXiv Detail & Related papers (2024-03-05T18:22:33Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.