Learning to Abstain From Uninformative Data
- URL: http://arxiv.org/abs/2309.14240v1
- Date: Mon, 25 Sep 2023 15:55:55 GMT
- Title: Learning to Abstain From Uninformative Data
- Authors: Yikai Zhang, Songzhu Zheng, Mina Dalirrooyfard, Pengxiang Wu, Anderson
Schneider, Anant Raj, Yuriy Nevmyvaka, Chao Chen
- Abstract summary: We study the problem of learning and acting under a general noisy generative process.
In this problem, the data distribution has a significant proportion of uninformative samples with high noise in the label.
We propose a novel approach to learning under these conditions via a loss inspired by the selective learning theory.
- Score: 20.132146513548843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning and decision-making in domains with naturally high noise-to-signal
ratio, such as Finance or Healthcare, is often challenging, while the stakes
are very high. In this paper, we study the problem of learning and acting under
a general noisy generative process. In this problem, the data distribution has
a significant proportion of uninformative samples with high noise in the label,
while part of the data contains useful information represented by low label
noise. This dichotomy is present during both training and inference, which
requires the proper handling of uninformative data during both training and
testing. We propose a novel approach to learning under these conditions via a
loss inspired by the selective learning theory. By minimizing this loss, the
model is guaranteed to make a near-optimal decision by distinguishing
informative data from uninformative data and making predictions. We build upon
the strength of our theoretical guarantees by describing an iterative
algorithm, which jointly optimizes both a predictor and a selector, and
evaluates its empirical performance in a variety of settings.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Prioritizing Informative Features and Examples for Deep Learning from Noisy Data [4.741012804505562]
We propose a systemic framework that prioritizes informative features and examples to enhance each stage of the development process.
We first propose an approach to extract only informative features that are inherent to solving a target task by using auxiliary out-of-distribution data.
Next, we introduce an approach that prioritizes informative examples from unlabeled noisy data in order to reduce the labeling cost of active learning.
arXiv Detail & Related papers (2024-02-27T07:15:35Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Prototype-Anchored Learning for Learning with Imperfect Annotations [83.7763875464011]
It is challenging to learn unbiased classification models from imperfectly annotated datasets.
We propose a prototype-anchored learning (PAL) method, which can be easily incorporated into various learning-based classification schemes.
We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-06-23T10:25:37Z) - Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning.
We propose a novel algorithm that leverages noise stability to estimate data uncertainty.
Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.