Towards Efficient Data-Centric Robust Machine Learning with Noise-based
Augmentation
- URL: http://arxiv.org/abs/2203.03810v1
- Date: Tue, 8 Mar 2022 02:05:40 GMT
- Title: Towards Efficient Data-Centric Robust Machine Learning with Noise-based
Augmentation
- Authors: Xiaogeng Liu, Haoyu Wang, Yechao Zhang, Fangzhou Wu, Shengshan Hu
- Abstract summary: The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models.
We introduce a noised-based data augmentation method which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD adversarial perturbations.
The proposed method is built on lightweight algorithms and proved highly effective based on comprehensive evaluations.
- Score: 10.859556815535706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The data-centric machine learning aims to find effective ways to build
appropriate datasets which can improve the performance of AI models. In this
paper, we mainly focus on designing an efficient data-centric scheme to improve
robustness for models towards unforeseen malicious inputs in the black-box test
settings. Specifically, we introduce a noised-based data augmentation method
which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD
adversarial perturbations. The proposed method is built on lightweight
algorithms and proved highly effective based on comprehensive evaluations,
showing good efficiency on computation cost and robustness enhancement. In
addition, we share our insights about the data-centric robust machine learning
gained from our experiments.
Related papers
- Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL)
Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms.
We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z) - Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data.
This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data.
Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z) - Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques [0.0]
PAC learning half spaces under constant malicious noise, where a fraction of the training data is adversarially corrupted.
My study presents a novel, efficient algorithm that extends the existing theoretical frameworks to account for noise resilience in half space learning.
We provide a comprehensive analysis of the algorithm's performance, demonstrating its superior robustness to malicious noise when compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-10-21T23:08:17Z) - Bandit-Driven Batch Selection for Robust Learning under Label Noise [20.202806541218944]
We introduce a novel approach for batch selection in Gradient Descent (SGD) training, leveraging bandit algorithms.
Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets.
arXiv Detail & Related papers (2023-10-31T19:19:01Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in
Amateur Data in Imitation Learning [3.145455301228175]
Behavioral (BC) bears a high potential for safe and direct transfer of human skills to robots.
In order to allow the imitators to effectively learn from imperfect demonstrations, we propose to employ the robust t-momentum optimization algorithm.
We show empirically how the algorithm can be used to produce robust BC imitators against datasets with unknown heaviness.
arXiv Detail & Related papers (2021-08-02T04:30:41Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.