Towards Efficient Data-Centric Robust Machine Learning with Noise-based
Augmentation
- URL: http://arxiv.org/abs/2203.03810v1
- Date: Tue, 8 Mar 2022 02:05:40 GMT
- Title: Towards Efficient Data-Centric Robust Machine Learning with Noise-based
Augmentation
- Authors: Xiaogeng Liu, Haoyu Wang, Yechao Zhang, Fangzhou Wu, Shengshan Hu
- Abstract summary: The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models.
We introduce a noised-based data augmentation method which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD adversarial perturbations.
The proposed method is built on lightweight algorithms and proved highly effective based on comprehensive evaluations.
- Score: 10.859556815535706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The data-centric machine learning aims to find effective ways to build
appropriate datasets which can improve the performance of AI models. In this
paper, we mainly focus on designing an efficient data-centric scheme to improve
robustness for models towards unforeseen malicious inputs in the black-box test
settings. Specifically, we introduce a noised-based data augmentation method
which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD
adversarial perturbations. The proposed method is built on lightweight
algorithms and proved highly effective based on comprehensive evaluations,
showing good efficiency on computation cost and robustness enhancement. In
addition, we share our insights about the data-centric robust machine learning
gained from our experiments.
Related papers
- Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques [0.0]
PAC learning half spaces under constant malicious noise, where a fraction of the training data is adversarially corrupted.
My study presents a novel, efficient algorithm that extends the existing theoretical frameworks to account for noise resilience in half space learning.
We provide a comprehensive analysis of the algorithm's performance, demonstrating its superior robustness to malicious noise when compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-10-21T23:08:17Z) - EraseDiff: Erasing Data Influence in Diffusion Models [51.225365010401006]
We introduce EraseDiff, an unlearning algorithm to address concerns related to data memorization.
Our approach formulates the unlearning task as a constrained optimization problem.
We show that EraseDiff effectively preserves the model's utility, efficacy, and efficiency.
arXiv Detail & Related papers (2024-01-11T09:30:36Z) - Bandit-Driven Batch Selection for Robust Learning under Label Noise [20.202806541218944]
We introduce a novel approach for batch selection in Gradient Descent (SGD) training, leveraging bandit algorithms.
Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets.
arXiv Detail & Related papers (2023-10-31T19:19:01Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in
Amateur Data in Imitation Learning [3.145455301228175]
Behavioral (BC) bears a high potential for safe and direct transfer of human skills to robots.
In order to allow the imitators to effectively learn from imperfect demonstrations, we propose to employ the robust t-momentum optimization algorithm.
We show empirically how the algorithm can be used to produce robust BC imitators against datasets with unknown heaviness.
arXiv Detail & Related papers (2021-08-02T04:30:41Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.