HYRR: Hybrid Infused Reranking for Passage Retrieval
- URL: http://arxiv.org/abs/2212.10528v1
- Date: Tue, 20 Dec 2022 18:44:21 GMT
- Title: HYRR: Hybrid Infused Reranking for Passage Retrieval
- Authors: Jing Lu, Keith Hall, Ji Ma, Jianmo Ni
- Abstract summary: Hybrid Infused Reranking for Passages Retrieval is a framework for training rerankers based on a hybrid of BM25 and neural retrieval models.
We present evaluations on a supervised passage retrieval task using MS MARCO and zero-shot retrieval tasks using BEIR.
- Score: 18.537666294601458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a
framework for training rerankers based on a hybrid of BM25 and neural retrieval
models. Retrievers based on hybrid models have been shown to outperform both
BM25 and neural models alone. Our approach exploits this improved performance
when training a reranker, leading to a robust reranking model. The reranker, a
cross-attention neural model, is shown to be robust to different first-stage
retrieval systems, achieving better performance than rerankers simply trained
upon the first-stage retrievers in the multi-stage systems. We present
evaluations on a supervised passage retrieval task using MS MARCO and zero-shot
retrieval tasks using BEIR. The empirical results show strong performance on
both evaluations.
Related papers
- An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token.
We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains.
Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z) - Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers [6.773411876899064]
inference-free sparse models lag far behind in terms of search relevance when compared to both sparse and dense siamese models.
We propose two different approaches for performance improvement. First, we introduce the IDF-aware FLOPS loss, which introduces Inverted Document Frequency (IDF) to the sparsification of representations.
We find that it mitigates the negative impact of the FLOPS regularization on search relevance, allowing the model to achieve a better balance between accuracy and efficiency.
arXiv Detail & Related papers (2024-11-07T03:46:43Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards [11.79027801942033]
We propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER)
MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one.
We show that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29%.
arXiv Detail & Related papers (2023-06-28T09:51:25Z) - Zero-Shot Retrieval with Search Agents and Hybrid Environments [8.017306481455778]
Current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers.
We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder.
Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker.
arXiv Detail & Related papers (2022-09-30T13:50:25Z) - Towards Robust Ranker for Text Retrieval [83.15191578888188]
A ranker plays an indispensable role in the de facto'retrieval & rerank' pipeline.
A ranker plays an indispensable role in the de facto'retrieval & rerank' pipeline.
arXiv Detail & Related papers (2022-06-16T10:27:46Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z) - Efficiently Teaching an Effective Dense Retriever with Balanced Topic
Aware Sampling [37.01593605084575]
TAS-Balanced is an efficient topic-aware query and balanced margin sampling technique.
We show that our TAS-Balanced training method achieves state-of-the-art low-latency (64ms per query) results on two TREC Deep Learning Track query sets.
arXiv Detail & Related papers (2021-04-14T16:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.