Domain-specific Question Answering with Hybrid Search
- URL: http://arxiv.org/abs/2412.03736v2
- Date: Sat, 21 Dec 2024 20:28:23 GMT
- Title: Domain-specific Question Answering with Hybrid Search
- Authors: Dewang Sultania, Zhaoyu Lu, Twisha Naik, Franck Dernoncourt, David Seunghyun Yoon, Sanat Sharma, Trung Bui, Ashok Gupta, Tushar Vatsa, Suhas Suresha, Ishita Verma, Vibha Belavadi, Cheng Chen, Michael Friedrich,
- Abstract summary: We show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance.
Experimental results indicate that this hybrid method outperforms our single-retriever system.
- Score: 39.85176264551715
- License:
- Abstract: Domain specific question answering is an evolving field that requires specialized solutions to address unique challenges. In this paper, we show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance. Our system leverages a linear combination of relevance signals, including cosine similarity from dense retrieval, BM25 scores, and URL host matching, each with tunable boost parameters. Experimental results indicate that this hybrid method outperforms our single-retriever system, achieving improved accuracy while maintaining robust contextual grounding. These findings suggest that integrating multiple retrieval methodologies with weighted scoring effectively addresses the complexities of domain specific question answering in enterprise settings.
Related papers
- A diversity-enhanced genetic algorithm for efficient exploration of parameter spaces [0.0]
We present a Python package together with a practical guide for the implementation of a lightweight diversity-enhanced genetic algorithm (GA) approach for the exploration of multi-dimensional parameter spaces.
arXiv Detail & Related papers (2024-12-22T17:32:38Z) - MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity [30.346398341996476]
We propose a reinforcement learning-based framework that dynamically selects the most suitable retrieval strategy based on query complexity.
Our method achieves new state of the art results on multiple single-hop and multi-hop datasets while reducing retrieval costs.
arXiv Detail & Related papers (2024-12-02T14:55:02Z) - Optimizing Retrieval-Augmented Generation with Elasticsearch for Enhanced Question-Answering Systems [2.4299671488193497]
This study aims to improve the accuracy and quality of large-scale language models (LLMs) in answering questions by integrating into the Retrieval Augmented Generation (RAG) framework.
The experiment uses the Stanford Question Answering dataset (SQuAD) version 2.0 as the test dataset.
arXiv Detail & Related papers (2024-10-18T04:17:49Z) - Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities.
We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval [0.0]
This study proposes a novel hybrid retrieval strategy for Retrieval-Augmented Generation (RAG)
Traditional cosine similarity measure is widely used to capture the similarity between vectors in high-dimensional spaces.
We incorporate cosine distance measures to provide a complementary perspective by quantifying the dissimilarity between vectors.
arXiv Detail & Related papers (2024-06-02T06:48:43Z) - Hybrid and Collaborative Passage Reranking [144.83902343298112]
We propose a Hybrid and Collaborative Passage Reranking (HybRank) method.
It incorporates the lexical and semantic properties of sparse and dense retrievers for reranking.
Built on off-the-shelf retriever features, HybRank is a plug-in reranker capable of enhancing arbitrary passage lists.
arXiv Detail & Related papers (2023-05-16T09:38:52Z) - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement [80.94378602238432]
We propose an efficient structure named Correspondence Efficient Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner.
To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates.
Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
arXiv Detail & Related papers (2022-09-25T13:05:33Z) - Multidimensional Assignment Problem for multipartite entity resolution [69.48568967931608]
Multipartite entity resolution aims at integrating records from multiple datasets into one entity.
We apply two procedures, a Greedy algorithm and a large scale neighborhood search, to solve the assignment problem.
We find evidence that design-based multi-start can be more efficient as the size of databases grow large.
arXiv Detail & Related papers (2021-12-06T20:34:55Z) - Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions.
Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers.
Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.