It's All Relative! -- A Synthetic Query Generation Approach for
Improving Zero-Shot Relevance Prediction
- URL: http://arxiv.org/abs/2311.07930v1
- Date: Tue, 14 Nov 2023 06:16:49 GMT
- Title: It's All Relative! -- A Synthetic Query Generation Approach for
Improving Zero-Shot Relevance Prediction
- Authors: Aditi Chaudhary, Karthik Raman, Michael Bendersky
- Abstract summary: Large language models (LLMs) have shown promise in their ability to generate synthetic query-document pairs by prompting with as few as 8 demonstrations.
We propose to reduce this burden by generating queries simultaneously for different labels.
- Score: 19.881193965130173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in large language models (LLMs) have shown promise in
their ability to generate synthetic query-document pairs by prompting with as
few as 8 demonstrations. This has enabled building better IR models, especially
for tasks with no training data readily available. Typically, such synthetic
query generation (QGen) approaches condition on an input context (e.g. a text
document) and generate a query relevant to that context, or condition the QGen
model additionally on the relevance label (e.g. relevant vs irrelevant) to
generate queries across relevance buckets. However, we find that such QGen
approaches are sub-optimal as they require the model to reason about the
desired label and the input from a handful of examples. In this work, we
propose to reduce this burden of LLMs by generating queries simultaneously for
different labels. We hypothesize that instead of asking the model to generate,
say, an irrelevant query given an input context, asking the model to generate
an irrelevant query relative to a relevant query is a much simpler task setup
for the model to reason about. Extensive experimentation across seven IR
datasets shows that synthetic queries generated in such a fashion translates to
a better downstream performance, suggesting that the generated queries are
indeed of higher quality.
Related papers
- Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - Is Complex Query Answering Really Complex? [28.8459899849641]
We show that the current benchmarks for CQA are not really complex, and the way they are built distorts our perception of progress in this field.
We propose a set of more challenging benchmarks, composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs.
arXiv Detail & Related papers (2024-10-16T13:19:03Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation [42.82192656794179]
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses.
This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios.
Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response generation process.
arXiv Detail & Related papers (2024-03-31T08:58:54Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Exploring the Viability of Synthetic Query Generation for Relevance
Prediction [18.77909480819682]
We conduct a study into how QGen approaches can be leveraged for nuanced relevance prediction.
We identify new shortcomings of existing QGen approaches -- including their inability to distinguish between different grades of relevance.
We introduce label-grained QGen models which incorporates knowledge about the different relevance.
arXiv Detail & Related papers (2023-05-19T18:03:36Z) - A Lightweight Constrained Generation Alternative for Query-focused
Summarization [8.264410236351111]
Query-focused summarization (QFS) aims to provide a summary of a document that satisfies information need of a given query.
We propose leveraging a recently developed constrained generation model Neurological Decoding (NLD) as an alternative to current QFS regimes.
We demonstrate the efficacy of this approach on two public QFS collections achieving near parity with the state-of-the-art model with substantially reduced complexity.
arXiv Detail & Related papers (2023-04-23T18:43:48Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Query Embedding on Hyper-relational Knowledge Graphs [0.4779196219827507]
Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs.
We extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries.
arXiv Detail & Related papers (2021-06-15T14:08:50Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Leveraging Passage Retrieval with Generative Models for Open Domain
Question Answering [61.394478670089065]
Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge.
We investigate how much these models can benefit from retrieving text passages, potentially containing evidence.
We observe that the performance of this method significantly improves when increasing the number of retrieved passages.
arXiv Detail & Related papers (2020-07-02T17:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.