QUILL: Query Intent with Large Language Models using Retrieval
Augmentation and Multi-stage Distillation
- URL: http://arxiv.org/abs/2210.15718v1
- Date: Thu, 27 Oct 2022 18:44:58 GMT
- Title: QUILL: Query Intent with Large Language Models using Retrieval
Augmentation and Multi-stage Distillation
- Authors: Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca
Bertelli and Mike Bendersky
- Abstract summary: We show that Retrieval Augmentation of queries provides LLMs with valuable additional context enabling improved understanding.
We use a novel two-stage distillation approach that allows us to carry over the gains of retrieval augmentation, without suffering the increased compute typically associated with it.
- Score: 1.516937009186805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown impressive results on a variety of
text understanding tasks. Search queries though pose a unique challenge, given
their short-length and lack of nuance or context. Complicated feature
engineering efforts do not always lead to downstream improvements as their
performance benefits may be offset by increased complexity of knowledge
distillation. Thus, in this paper we make the following contributions: (1) We
demonstrate that Retrieval Augmentation of queries provides LLMs with valuable
additional context enabling improved understanding. While Retrieval
Augmentation typically increases latency of LMs (thus hurting distillation
efficacy), (2) we provide a practical and effective way of distilling Retrieval
Augmentation LLMs. Specifically, we use a novel two-stage distillation approach
that allows us to carry over the gains of retrieval augmentation, without
suffering the increased compute typically associated with it. (3) We
demonstrate the benefits of the proposed approach (QUILL) on a billion-scale,
real-world query understanding system resulting in huge gains. Via extensive
experiments, including on public benchmarks, we believe this work offers a
recipe for practical use of retrieval-augmented query understanding.
Related papers
- mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA [78.45521005703958]
multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge.
We propose a novel framework called textbfRetrieval-textbfReftextbfAugmented textbfGeneration (mR$2$AG) which achieves adaptive retrieval and useful information localization.
mR$2$AG significantly outperforms state-of-the-art MLLMs on INFOSEEK and Encyclopedic-VQA
arXiv Detail & Related papers (2024-11-22T16:15:50Z) - Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation [21.823931225182115]
We propose a model-based evidence extraction learning framework, SEER, to optimize a vanilla model as an evidence extractor.
Our method largely improves the final RAG performance, enhances the faithfulness, helpfulness, and conciseness of the extracted evidence, and reduces the evidence length by 9.25 times.
arXiv Detail & Related papers (2024-10-15T06:26:24Z) - Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting [21.04933334040135]
We introduce the Self-Prompting framework, a novel method designed to fully harness the embedded RE knowledge within Large Language Models.
Our framework employs a three-stage diversity approach to prompt LLMs, generating multiple synthetic samples that encapsulate specific relations from scratch.
Experimental evaluations on benchmark datasets show our approach outperforms existing LLM-based zero-shot RE methods.
arXiv Detail & Related papers (2024-10-02T01:12:54Z) - Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting [68.90949377014742]
Speculative RAG is a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM.
Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts.
It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth.
arXiv Detail & Related papers (2024-07-11T06:50:19Z) - MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model [4.173772253427094]
Large Language Models (LLMs) often struggle with hallucinations and outdated information.
To address this, Information Retrieval (IR) systems can be employed to augment LLMs with up-to-date knowledge.
We propose an approach that leverages learning-to-rank techniques to combine heterogeneous IR systems.
arXiv Detail & Related papers (2024-06-09T11:00:01Z) - SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs)
SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe)
Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z) - Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering [14.389264346634507]
We propose EFSum, an Evidence-focused Fact Summarization framework for enhanced Quesetion Answering (QA) performance.
Our experiments show that EFSum improves LLM's zero-shot QA performance.
arXiv Detail & Related papers (2024-03-05T13:43:58Z) - Efficient Exploration for LLMs [27.59380499111532]
We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models.
In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received.
Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries.
arXiv Detail & Related papers (2024-02-01T07:32:24Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs)
Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process.
We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.