A Hybrid Approach to Information Retrieval and Answer Generation for Regulatory Texts
- URL: http://arxiv.org/abs/2502.16767v1
- Date: Mon, 24 Feb 2025 01:16:16 GMT
- Title: A Hybrid Approach to Information Retrieval and Answer Generation for Regulatory Texts
- Authors: Jhon Rayo, Raul de la Rosa, Mario Garrido,
- Abstract summary: This paper introduces a hybrid information retrieval system that combines lexical and semantic search techniques.<n>The system integrates a fine-tuned sentence transformer model with the traditional BM25 algorithm to achieve both semantic precision and lexical coverage.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Regulatory texts are inherently long and complex, presenting significant challenges for information retrieval systems in supporting regulatory officers with compliance tasks. This paper introduces a hybrid information retrieval system that combines lexical and semantic search techniques to extract relevant information from large regulatory corpora. The system integrates a fine-tuned sentence transformer model with the traditional BM25 algorithm to achieve both semantic precision and lexical coverage. To generate accurate and comprehensive responses, retrieved passages are synthesized using Large Language Models (LLMs) within a Retrieval Augmented Generation (RAG) framework. Experimental results demonstrate that the hybrid system significantly outperforms standalone lexical and semantic approaches, with notable improvements in Recall@10 and MAP@10. By openly sharing our fine-tuned model and methodology, we aim to advance the development of robust natural language processing tools for compliance-driven applications in regulatory domains.
Related papers
- Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.
Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.
We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance.
Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws.
We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration [60.47645731801866]
Large language models (LLMs) are increasingly leveraged as foundational backbones in advanced recommender systems.<n>LLMs are pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone.<n>We propose EAGER-LLM, a decoder-only generative recommendation framework that integrates endogenous and endogenous behavioral and semantic information in a non-intrusive manner.
arXiv Detail & Related papers (2025-02-20T17:01:57Z) - Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A [15.86510147965235]
General Data Protection Regulation requires precise processing information to be clear and accessible.<n>This paper examines state-of-the-art Retrieval Generation (RAG) systems enhanced with alignment techniques to fulfill obligations.
arXiv Detail & Related papers (2025-02-10T16:42:00Z) - Concept Navigation and Classification via Open Source Large Language Model Processing [0.0]
This paper presents a novel methodological framework for detecting and classifying latent constructs from textual data using Open-Source Large Language Models (LLMs)<n>The proposed hybrid approach combines automated summarization with human-in-the-loop validation to enhance the accuracy and interpretability of construct identification.
arXiv Detail & Related papers (2025-02-07T08:42:34Z) - A Proposed Large Language Model-Based Smart Search for Archive System [0.0]
This study presents a novel framework for smart search in digital archival systems.
By employing a Retrieval-Augmented Generation (RAG) approach, the framework enables the processing of natural language queries.
We present the architecture and implementation of the system and evaluate its performance in four experiments.
arXiv Detail & Related papers (2025-01-13T02:53:07Z) - 1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering [0.0]
This paper presents our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge.<n>We leverage advanced information retrieval and answer generation techniques in regulatory domains.<n>We utilize a novel approach, LeSeR, which achieved competitive results with a recall@10 of 0.8201 and map@10 of 0.6655 for retrievals.
arXiv Detail & Related papers (2024-12-08T17:53:43Z) - A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation [5.930799903736776]
This research introduces a novel text generation model that combines BERT's semantic interpretation strengths with GPT-4's generative capabilities.
The model enhances semantic depth and maintains smooth, human-like text flow, overcoming limitations seen in prior models.
arXiv Detail & Related papers (2024-11-19T01:41:56Z) - An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms [62.878616839799776]
We propose SynthRAG, an innovative framework designed to enhance Question Answering (QA) performance.
SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring.
An online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement.
arXiv Detail & Related papers (2024-10-23T09:14:57Z) - HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications [2.527078412319764]
This paper introduces a Hybrid.
Adaptive RAG (HyPA-RAG) system tailored for AI legal and policy.
By dynamically adjusting parameters, HyPA-RAG significantly improves retrieval accuracy and response fidelity.
arXiv Detail & Related papers (2024-08-29T16:11:20Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.