Learning Deep Neural Networks under Agnostic Corrupted Supervision
- URL: http://arxiv.org/abs/2102.06735v1
- Date: Fri, 12 Feb 2021 19:36:04 GMT
- Title: Learning Deep Neural Networks under Agnostic Corrupted Supervision
- Authors: Boyang Liu, Mengying Sun, Ding Wang, Pang-Ning Tan, Jiayu Zhou
- Abstract summary: We present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption.
Our algorithm focuses on controlling the collective impact of data points on the average gradient.
Experiments on multiple benchmark datasets have demonstrated the robustness of our algorithm under different types of corruption.
- Score: 37.441467641123026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural models in the presence of corrupted supervision is
challenging as the corrupted data points may significantly impact the
generalization performance. To alleviate this problem, we present an efficient
robust algorithm that achieves strong guarantees without any assumption on the
type of corruption and provides a unified framework for both classification and
regression problems. Unlike many existing approaches that quantify the quality
of the data points (e.g., based on their individual loss values), and filter
them accordingly, the proposed algorithm focuses on controlling the collective
impact of data points on the average gradient. Even when a corrupted data point
failed to be excluded by our algorithm, the data point will have a very limited
impact on the overall loss, as compared with state-of-the-art filtering methods
based on loss values. Extensive experiments on multiple benchmark datasets have
demonstrated the robustness of our algorithm under different types of
corruption.
Related papers
- Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions [8.666879925570331]
Real-world offline datasets are often subject to data corruptions due to sensor failures or malicious attacks.
Existing methods struggle to learn robust agents under high uncertainty caused by corrupted data.
We propose a novel robust variational Bayesian inference for offline RL (TRACER)
arXiv Detail & Related papers (2024-11-01T09:28:24Z) - A Bilevel Optimization Framework for Imbalanced Data Classification [1.6385815610837167]
We propose a new undersampling approach that avoids the pitfalls of noise and overlap caused by synthetic data.
Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss.
Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint's impact on loss and rejects those unable to improve it.
arXiv Detail & Related papers (2024-10-15T01:17:23Z) - Multigroup Robustness [5.659543670443081]
We study multigroup robust algorithms whose robustness guarantees for each subpopulation only degrade with the amount of data corruption inside that subpopulation.
Our techniques establish a new connection between multigroup fairness and robustness.
arXiv Detail & Related papers (2024-05-01T16:35:04Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Combating noisy labels in object detection datasets [0.0]
We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets.
We identify missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections.
The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1.
arXiv Detail & Related papers (2022-11-25T10:05:06Z) - A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks.
We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels.
We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z) - Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.