Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models
- URL: http://arxiv.org/abs/2403.18093v1
- Date: Tue, 26 Mar 2024 20:25:53 GMT
- Title: Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models
- Authors: Hai-Long Nguyen, Duc-Minh Nguyen, Tan-Minh Nguyen, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, Ken Satoh,
- Abstract summary: This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system.
Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy.
However, error analysis reveals several existing issues in the retrieval system that still need resolution.
- Score: 7.299483088092052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models with billions of parameters, such as GPT-3.5, GPT-4, and LLaMA, are increasingly prevalent. Numerous studies have explored effective prompting techniques to harness the power of these LLMs for various research problems. Retrieval, specifically in the legal data domain, poses a challenging task for the direct application of Prompting techniques due to the large number and substantial length of legal articles. This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system, preceded by the support of two phases: BM25 Pre-ranking and BERT-based Re-ranking. Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy. However, error analysis reveals several existing issues in the retrieval system that still need resolution.
Related papers
- MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval [6.952344923975001]
This work focuses on utilizing the logical reasoning capabilities of large language models (LLMs) to identify relevant legal terms.
The proposed retrieval system integrates additional information from the term--based expansion and query reformulation to improve the retrieval accuracy.
Experiments on COLIEE 2022 and COLIEE 2023 datasets show that extra knowledge from LLMs helps to improve the retrieval result of both lexical and semantic ranking models.
arXiv Detail & Related papers (2024-10-16T01:34:14Z) - A Systematic Survey on Large Language Models for Algorithm Design [25.556342145274613]
Algorithm Design (AD) is crucial for effective problem-solving across various domains.
The advent of Large Language Models (LLMs) has notably enhanced the automation and innovation within this field.
arXiv Detail & Related papers (2024-10-11T13:17:19Z) - Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment [16.39696580487218]
Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval.
Recent research has explored using large language models (LLMs) as retrievers, achieving SOTA performance across various tasks.
arXiv Detail & Related papers (2024-08-22T08:16:07Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead [12.324949480085424]
There is currently no existing survey that focuses on the utilization of Large Language Models for vulnerability detection and repair.
This paper offers a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs.
arXiv Detail & Related papers (2024-04-03T07:27:33Z) - Reliable, Adaptable, and Attributable Language Models with Retrieval [144.26890121729514]
Parametric language models (LMs) are trained on vast amounts of web data.
They face practical challenges such as hallucinations, difficulty in adapting to new data distributions, and a lack of verifiability.
We advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs.
arXiv Detail & Related papers (2024-03-05T18:22:33Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training [9.128501882000315]
Large language models (LLMs) are struggling to seek correct information in long contexts.
This paper proposes to enhance the information searching and reflection ability of LLMs in long contexts via specially designed tasks.
Experimental results show substantial improvement in Multi-doc QA and other benchmarks, superior to state-of-the-art models by 13.7% absolute gain in shuffled settings.
arXiv Detail & Related papers (2023-11-15T18:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.