Learning to Rank in Generative Retrieval
- URL: http://arxiv.org/abs/2306.15222v2
- Date: Sat, 16 Dec 2023 13:26:02 GMT
- Title: Learning to Rank in Generative Retrieval
- Authors: Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li
- Abstract summary: Generative retrieval aims to generate identifier strings of relevant passages as the retrieval target.
We propose a learning-to-rank framework for generative retrieval, dubbed LTRGR.
This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems.
- Score: 62.91492903161522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative retrieval stands out as a promising new paradigm in text retrieval
that aims to generate identifier strings of relevant passages as the retrieval
target. This generative paradigm taps into powerful generative language models,
distinct from traditional sparse or dense retrieval methods. However, only
learning to generate is insufficient for generative retrieval. Generative
retrieval learns to generate identifiers of relevant passages as an
intermediate goal and then converts predicted identifiers into the final
passage rank list. The disconnect between the learning objective of
autoregressive models and the desired passage ranking target leads to a
learning gap. To bridge this gap, we propose a learning-to-rank framework for
generative retrieval, dubbed LTRGR. LTRGR enables generative retrieval to learn
to rank passages directly, optimizing the autoregressive model toward the final
passage ranking target via a rank loss. This framework only requires an
additional learning-to-rank training phase to enhance current generative
retrieval systems and does not add any burden to the inference stage. We
conducted experiments on three public benchmarks, and the results demonstrate
that LTRGR achieves state-of-the-art performance among generative retrieval
methods. The code and checkpoints are released at
https://github.com/liyongqi67/LTRGR.
Related papers
- Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval [108.9772640854136]
Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query.
Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning.
We introduce BootRet, a bootstrapped pre-training method for generative retrieval that dynamically adjusts document identifiers during pre-training to accommodate the continuing of the corpus.
arXiv Detail & Related papers (2024-07-16T08:42:36Z) - Distillation Enhanced Generative Retrieval [96.69326099136289]
Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target.
In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR.
We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods.
arXiv Detail & Related papers (2024-02-16T15:48:24Z) - Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Unsupervised Text Generation by Learning from Search [86.51619839836331]
TGLS is a novel framework to unsupervised Text Generation by Learning.
We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, paraphrase generation and text formalization.
arXiv Detail & Related papers (2020-07-09T04:34:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.