Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented
Language Models
- URL: http://arxiv.org/abs/2305.16243v3
- Date: Tue, 4 Jul 2023 07:59:15 GMT
- Title: Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented
Language Models
- Authors: Ehsan Doostmohammadi, Tobias Norlund, Marco Kuhlmann, Richard
Johansson
- Abstract summary: We study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities.
Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity.
- Score: 1.0552465253379135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Augmenting language models with a retrieval mechanism has been shown to
significantly improve their performance while keeping the number of parameters
low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism
based on the similarity between dense representations of the query chunk and
potential neighbors. In this paper, we study the state-of-the-art Retro model
and observe that its performance gain is better explained by surface-level
similarities, such as token overlap. Inspired by this, we replace the semantic
retrieval in Retro with a surface-level method based on BM25, obtaining a
significant reduction in perplexity. As full BM25 retrieval can be
computationally costly for large datasets, we also apply it in a re-ranking
scenario, gaining part of the perplexity reduction with minimal computational
overhead.
Related papers
- Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - SRFormer: Text Detection Transformer with Incorporated Segmentation and
Regression [6.74412860849373]
We propose SRFormer, a unified DETR-based model with amalgamated and Regression.
Our empirical analysis indicates that favorable segmentation predictions can be obtained at the initial decoder layers.
Our method's exceptional robustness, superior training and data efficiency, as well as its state-of-the-art performance.
arXiv Detail & Related papers (2023-08-21T07:34:31Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Compressing Sentence Representation with maximum Coding Rate Reduction [0.0]
In most natural language inference problems, sentence representation is needed for semantic retrieval tasks.
Due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model.
We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
arXiv Detail & Related papers (2023-04-25T09:23:43Z) - On the Generalization Ability of Retrieval-Enhanced Transformers [1.0552465253379135]
Off-loading memory from trainable weights to a retrieval database can significantly improve language modeling.
It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval.
We find that the performance gains from retrieval largely originate from overlapping tokens between the database and the test data.
arXiv Detail & Related papers (2023-02-23T16:11:04Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - Recurrent Feedback Improves Recognition of Partially Occluded Objects [1.452875650827562]
We investigate if and how artificial neural networks also benefit from recurrence.
We find that classification accuracy is significantly higher for recurrent models when compared to feedforward models of matched parametric complexity.
arXiv Detail & Related papers (2021-04-21T16:18:34Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.