Noisy Pair Corrector for Dense Retrieval
- URL: http://arxiv.org/abs/2311.03798v1
- Date: Tue, 7 Nov 2023 08:27:14 GMT
- Title: Noisy Pair Corrector for Dense Retrieval
- Authors: Hang Zhang, Yeyun Gong, Xingwei He, Dayiheng Liu, Daya Guo, Jiancheng
Lv, Jian Guo
- Abstract summary: We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
- Score: 59.312376423104055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most dense retrieval models contain an implicit assumption: the training
query-document pairs are exactly matched. Since it is expensive to annotate the
corpus manually, training pairs in real-world applications are usually
collected automatically, which inevitably introduces mismatched-pair noise. In
this paper, we explore an interesting and challenging problem in dense
retrieval, how to train an effective model with mismatched-pair noise. To solve
this problem, we propose a novel approach called Noisy Pair Corrector (NPC),
which consists of a detection module and a correction module. The detection
module estimates noise pairs by calculating the perplexity between annotated
positive and easy negative documents. The correction module utilizes an
exponential moving average (EMA) model to provide a soft supervised signal,
aiding in mitigating the effects of noise. We conduct experiments on
text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks
StaQC and SO-DS. Experimental results show that NPC achieves excellent
performance in handling both synthetic and realistic noise.
Related papers
- Robust Learning under Hybrid Noise [24.36707245704713]
We propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery.
arXiv Detail & Related papers (2024-07-04T16:13:25Z) - Pivotal Auto-Encoder via Self-Normalizing ReLU [20.76999663290342]
We formalize single hidden layer sparse auto-encoders as a transform learning problem.
We propose an optimization problem that leads to a predictive model invariant to the noise level at test time.
Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise.
arXiv Detail & Related papers (2024-06-23T09:06:52Z) - SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time.
We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level.
Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z) - Understanding the Effect of Noise in LLM Training Data with Algorithmic
Chains of Thought [0.0]
We study how noise in chain of thought impacts task performance in highly-controlled setting.
We define two types of noise: textitstatic noise, a local form of noise which is applied after the CoT trace is computed, and textitdynamic noise, a global form of noise which propagates errors in the trace as it is computed.
We find fine-tuned models are extremely robust to high levels of static noise but struggle significantly more with lower levels of dynamic noise.
arXiv Detail & Related papers (2024-02-06T13:59:56Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional
Resampling [34.565077865854484]
We propose noise adaptive speech enhancement with target-conditional resampling (NASTAR)
NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.
Experimental results show that NASTAR can effectively use one noisy speech sample to adapt an SE model to a target condition.
arXiv Detail & Related papers (2022-06-18T00:15:48Z) - Learning with Group Noise [106.56780716961732]
We propose a novel Max-Matching method for learning with group noise.
The performance on arange of real-world datasets in the area of several learning paradigms demonstrates the effectiveness of Max-Matching.
arXiv Detail & Related papers (2021-03-17T06:57:10Z) - Towards Noise-resistant Object Detection with Noisy Annotations [119.63458519946691]
Training deep object detectors requires significant amount of human-annotated images with accurate object labels and bounding box coordinates.
Noisy annotations are much more easily accessible, but they could be detrimental for learning.
We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.
arXiv Detail & Related papers (2020-03-03T01:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.