MFBE: Leveraging Multi-Field Information of FAQs for Efficient Dense
Retrieval
- URL: http://arxiv.org/abs/2302.11953v2
- Date: Tue, 21 Mar 2023 18:38:10 GMT
- Title: MFBE: Leveraging Multi-Field Information of FAQs for Efficient Dense
Retrieval
- Authors: Debopriyo Banerjee, Mausam Jain and Ashish Kulkarni
- Abstract summary: We propose a bi-encoder-based query-FAQ matching model that leverages multiple combinations of FAQ fields.
Our model achieves around 27% and 20% better top-1 accuracy for the FAQ retrieval task on internal and open datasets.
- Score: 1.7403133838762446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the domain of question-answering in NLP, the retrieval of Frequently Asked
Questions (FAQ) is an important sub-area which is well researched and has been
worked upon for many languages. Here, in response to a user query, a retrieval
system typically returns the relevant FAQs from a knowledge-base. The efficacy
of such a system depends on its ability to establish semantic match between the
query and the FAQs in real-time. The task becomes challenging due to the
inherent lexical gap between queries and FAQs, lack of sufficient context in
FAQ titles, scarcity of labeled data and high retrieval latency. In this work,
we propose a bi-encoder-based query-FAQ matching model that leverages multiple
combinations of FAQ fields (like, question, answer, and category) both during
model training and inference. Our proposed Multi-Field Bi-Encoder (MFBE) model
benefits from the additional context resulting from multiple FAQ fields and
performs well even with minimal labeled data. We empirically support this claim
through experiments on proprietary as well as open-source public datasets in
both unsupervised and supervised settings. Our model achieves around 27% and
20% better top-1 accuracy for the FAQ retrieval task on internal and open
datasets, respectively over the best performing baseline.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations [0.0]
In customer contact centers, human agents often struggle with long average handling times (AHT)
We propose a decision support system that can look beyond RAG by first identifying customer questions in real time.
If the query matches an FAQ, the system retrieves the answer directly from the FAQ database; otherwise, it generates answers via RAG.
arXiv Detail & Related papers (2024-10-14T04:06:22Z) - Selecting Query-bag as Pseudo Relevance Feedback for Information-seeking Conversations [76.70349332096693]
Information-seeking dialogue systems are widely used in e-commerce systems.
We propose a Query-bag based Pseudo Relevance Feedback framework (QB-PRF)
It constructs a query-bag with related queries to serve as pseudo signals to guide information-seeking conversations.
arXiv Detail & Related papers (2024-03-22T08:10:32Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs
Answering [13.735277588793997]
We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs)
Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures.
We propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques.
arXiv Detail & Related papers (2023-04-27T01:48:03Z) - Multi-Tenant Optimization For Few-Shot Task-Oriented FAQ Retrieval [0.0]
Business-specific Frequently Asked Questions (FAQ) retrieval in task-oriented dialog systems poses unique challenges.
We evaluate performance for such Business FAQ using query-Question (q-Q) similarity and few-shot intent detection techniques.
We propose a novel approach to scale multi-tenant FAQ applications in real-world context by contrastive fine-tuning of the last layer in sentence Bi-Encoders along with tenant-specific weight switching.
arXiv Detail & Related papers (2023-01-25T10:55:45Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - SYGMA: System for Generalizable Modular Question Answering OverKnowledge
Bases [57.89642289610301]
We present SYGMA, a modular approach facilitating general-izability across multiple knowledge bases and multiple rea-soning types.
We demonstrate effectiveness of our system by evaluating on datasets belonging to two distinct knowledge bases,DBpedia and Wikidata.
arXiv Detail & Related papers (2021-09-28T01:57:56Z) - Effective FAQ Retrieval and Question Matching With Unsupervised
Knowledge Injection [10.82418428209551]
We propose a contextual language model for retrieving appropriate answers to frequently asked questions.
We also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner.
We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task.
arXiv Detail & Related papers (2020-10-27T05:03:34Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Guided Transformer: Leveraging Multiple External Sources for
Representation Learning in Conversational Search [36.64582291809485]
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems.
In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources.
Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
arXiv Detail & Related papers (2020-06-13T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.