Related papers: Learning Deep Neural Networks under Agnostic Corrupted Supervision

Learning Deep Neural Networks under Agnostic Corrupted Supervision

URL: http://arxiv.org/abs/2102.06735v1
Date: Fri, 12 Feb 2021 19:36:04 GMT
Title: Learning Deep Neural Networks under Agnostic Corrupted Supervision
Authors: Boyang Liu, Mengying Sun, Ding Wang, Pang-Ning Tan, Jiayu Zhou
Abstract summary: We present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption. Our algorithm focuses on controlling the collective impact of data points on the average gradient. Experiments on multiple benchmark datasets have demonstrated the robustness of our algorithm under different types of corruption.
Score: 37.441467641123026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training deep neural models in the presence of corrupted supervision is challenging as the corrupted data points may significantly impact the generalization performance. To alleviate this problem, we present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption and provides a unified framework for both classification and regression problems. Unlike many existing approaches that quantify the quality of the data points (e.g., based on their individual loss values), and filter them accordingly, the proposed algorithm focuses on controlling the collective impact of data points on the average gradient. Even when a corrupted data point failed to be excluded by our algorithm, the data point will have a very limited impact on the overall loss, as compared with state-of-the-art filtering methods based on loss values. Extensive experiments on multiple benchmark datasets have demonstrated the robustness of our algorithm under different types of corruption.

Related papers

Geometric Median Matching for Robust k-Subset Selection from Noisy Data [75.86423267723728]
We propose a novel k-subset selection strategy that leverages Geometric Median -- a robust estimator with an optimal breakdown point of 1/2. Our method iteratively selects a k-subset such that the mean of the subset approximates the GM of the (potentially) noisy dataset, ensuring robustness even under arbitrary corruption.
arXiv Detail & Related papers (2025-04-01T09:22:05Z)
Tighten The Lasso: A Convex Hull Volume-based Anomaly Detection Method [0.6144680854063939]
We propose a novel anomaly detection algorithm based on the convex hull property of a dataset. Our algorithm computes the CH's volume as an increasing number of data points are removed from the dataset. We show that with a computationally cheap and simple check, one can detect datasets that are well-suited for the proposed algorithm.
arXiv Detail & Related papers (2025-02-25T19:39:20Z)
An Efficient Outlier Detection Algorithm for Data Streaming [51.56874851156008]
Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data. We propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases.
arXiv Detail & Related papers (2025-01-02T05:12:43Z)
Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies [8.770864706004472]
Data corruption, including missing and noisy data, poses significant challenges in real-world machine learning. This study investigates the effects of data corruption on model performance and explores strategies to mitigate these effects. We find that increasing dataset size mitigates but cannot fully overcome the effects of data corruption.
arXiv Detail & Related papers (2024-12-24T09:04:06Z)
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions [8.666879925570331]
Real-world offline datasets are often subject to data corruptions due to sensor failures or malicious attacks. Existing methods struggle to learn robust agents under high uncertainty caused by corrupted data. We propose a novel robust variational Bayesian inference for offline RL (TRACER)
arXiv Detail & Related papers (2024-11-01T09:28:24Z)
A Bilevel Optimization Framework for Imbalanced Data Classification [1.6385815610837167]
We propose a new undersampling approach that avoids the pitfalls of noise and overlap caused by synthetic data. Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss. Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint's impact on loss and rejects those unable to improve it.
arXiv Detail & Related papers (2024-10-15T01:17:23Z)
Multigroup Robustness [5.659543670443081]
We study multigroup robust algorithms whose robustness guarantees for each subpopulation only degrade with the amount of data corruption inside that subpopulation. Our techniques establish a new connection between multigroup fairness and robustness.
arXiv Detail & Related papers (2024-05-01T16:35:04Z)
Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer. We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z)
Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z)
Combating noisy labels in object detection datasets [0.0]
We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets. We identify missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1.
arXiv Detail & Related papers (2022-11-25T10:05:06Z)
A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks. We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels. We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets. There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected. We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z)
Establishing strong imputation performance of a denoising autoencoder in a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation. We benchmarked the results against state-of-the-art imputation methods. The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.