Rethinking Dense Retrieval's Few-Shot Ability
- URL: http://arxiv.org/abs/2304.05845v1
- Date: Wed, 12 Apr 2023 13:20:16 GMT
- Title: Rethinking Dense Retrieval's Few-Shot Ability
- Authors: Si Sun, Yida Lu, Shi Yu, Xiangyang Li, Zhonghua Li, Zhao Cao, Zhiyuan
Liu, Deiming Ye and Jie Bao
- Abstract summary: Few-shot dense retrieval aims to generalize to novel search scenarios by learning a few samples.
Current methods often resort to random sampling from supervised datasets to create "few-data" setups.
We propose a customized FewDR dataset and a unified evaluation benchmark.
- Score: 24.86681340512899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot dense retrieval (DR) aims to effectively generalize to novel search
scenarios by learning a few samples. Despite its importance, there is little
study on specialized datasets and standardized evaluation protocols. As a
result, current methods often resort to random sampling from supervised
datasets to create "few-data" setups and employ inconsistent training
strategies during evaluations, which poses a challenge in accurately comparing
recent progress. In this paper, we propose a customized FewDR dataset and a
unified evaluation benchmark. Specifically, FewDR employs class-wise sampling
to establish a standardized "few-shot" setting with finely-defined classes,
reducing variability in multiple sampling rounds. Moreover, the dataset is
disjointed into base and novel classes, allowing DR models to be continuously
trained on ample data from base classes and a few samples in novel classes.
This benchmark eliminates the risk of novel class leakage, providing a reliable
estimation of the DR model's few-shot ability. Our extensive empirical results
reveal that current state-of-the-art DR models still face challenges in the
standard few-shot scene. Our code and data will be open-sourced at
https://github.com/OpenMatch/ANCE-Tele.
Related papers
- Benchmarking Spurious Bias in Few-Shot Image Classifiers [26.544938760265136]
Few-shot image classifiers show reliance on spurious correlations between classes and spurious attributes, known as spurious bias.
We propose FewSTAB to fairly demonstrate and quantify varied degrees of robustness of few-shot classifiers to spurious bias.
FewSTAB creates few-shot evaluation tasks with biased attributes so that using them for predictions can demonstrate poor performance.
arXiv Detail & Related papers (2024-09-04T17:07:46Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly.
We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Reconstruction guided Meta-learning for Few Shot Open Set Recognition [31.49168444631114]
We propose Reconstructing Exemplar-based Few-shot Open-set ClaSsifier (ReFOCS)
By using a novel exemplar reconstruction-based meta-learning strategy ReFOCS streamlines FSOSR.
We show ReFOCS to outperform multiple state-of-the-art methods.
arXiv Detail & Related papers (2021-07-31T23:23:35Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.