Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
- URL: http://arxiv.org/abs/2311.17949v1
- Date: Wed, 29 Nov 2023 05:33:28 GMT
- Title: Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
- Authors: Hamed Damirchi, Cristian Rodr\'iguez-Opazo, Ehsan Abbasnejad, Damien
Teney, Javen Qinfeng Shi, Stephen Gould, Anton van den Hengel
- Abstract summary: Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
- Score: 83.65380507372483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained models can dramatically reduce the amount of task-specific
data required to solve a problem, but they often fail to capture
domain-specific nuances out of the box. The Web likely contains the information
necessary to excel on any specific application, but identifying the right data
a priori is challenging. This paper shows how to leverage recent advances in
NLP and multi-modal learning to augment a pre-trained model with search engine
retrieval. We propose to retrieve useful data from the Web at test time based
on test cases that the model is uncertain about. Different from existing
retrieval-augmented approaches, we then update the model to address this
underlying uncertainty. We demonstrate substantial improvements in zero-shot
performance, e.g. a remarkable increase of 15 percentage points in accuracy on
the Stanford Cars and Flowers datasets. We also present extensive experiments
that explore the impact of noisy retrieval and different learning strategies.
Related papers
- Maximally-Informative Retrieval for State Space Model Generation [59.954191072042526]
We introduce Retrieval In-Context Optimization (RICO) to minimize model uncertainty for a particular query at test-time.<n>Unlike traditional retrieval-augmented generation (RAG), which relies on externals for document retrieval, our approach leverages direct feedback from the model.<n>We show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss.
arXiv Detail & Related papers (2025-06-13T18:08:54Z) - Offline Learning and Forgetting for Reasoning with Large Language Models [23.384882158333156]
We propose an effective approach that integrates search capabilities directly into the model by fine-tuning it on unpaired successful and failed reasoning paths.<n>Experiments on the challenging Game-of-24 and Countdown reasoning benchmarks show that, replacing CoT-generated data with search-generated data for offline fine-tuning improves success rates by around 23% over inference-time search baselines.<n>Our learning and forgetting objective consistently outperforms both supervised fine-tuning and preference-based methods.
arXiv Detail & Related papers (2025-04-15T16:30:02Z) - Provable unlearning in topic modeling and downstream tasks [36.571324268874264]
Provable guarantees for unlearning are often limited to supervised learning settings.
We provide the first theoretical guarantees for unlearning in the pre-training and fine-tuning paradigm.
We show that it is easier to unlearn pre-training data from models that have been fine-tuned to a particular task, and one can unlearn this data without modifying the base model.
arXiv Detail & Related papers (2024-11-19T16:04:31Z) - A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets.
We train a small parametric corrector network that adjusts stale cached target embeddings.
Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Certain and Approximately Certain Models for Statistical Learning [4.318959672085627]
We show that it is possible to learn accurate models directly from data with missing values for certain training data and target models.
We build efficient algorithms with theoretical guarantees to check this necessity and return accurate models in cases where imputation is unnecessary.
arXiv Detail & Related papers (2024-02-27T22:49:33Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Data Impressions: Mining Deep Models to Extract Samples for Data-free
Applications [26.48630545028405]
"Data Impressions" act as proxy to the training data and can be used to realize a variety of tasks.
We show the applicability of data impressions in solving several computer vision tasks.
arXiv Detail & Related papers (2021-01-15T11:37:29Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.