Related papers: Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation

Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation

URL: http://arxiv.org/abs/2203.03810v1
Date: Tue, 8 Mar 2022 02:05:40 GMT
Title: Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation
Authors: Xiaogeng Liu, Haoyu Wang, Yechao Zhang, Fangzhou Wu, Shengshan Hu
Abstract summary: The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models. We introduce a noised-based data augmentation method which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD adversarial perturbations. The proposed method is built on lightweight algorithms and proved highly effective based on comprehensive evaluations.
Score: 10.859556815535706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models. In this paper, we mainly focus on designing an efficient data-centric scheme to improve robustness for models towards unforeseen malicious inputs in the black-box test settings. Specifically, we introduce a noised-based data augmentation method which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD adversarial perturbations. The proposed method is built on lightweight algorithms and proved highly effective based on comprehensive evaluations, showing good efficiency on computation cost and robustness enhancement. In addition, we share our insights about the data-centric robust machine learning gained from our experiments.

Related papers

Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement. We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z)
Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data. This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data. Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z)
Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques [0.0]
PAC learning half spaces under constant malicious noise, where a fraction of the training data is adversarially corrupted. My study presents a novel, efficient algorithm that extends the existing theoretical frameworks to account for noise resilience in half space learning. We provide a comprehensive analysis of the algorithm's performance, demonstrating its superior robustness to malicious noise when compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-10-21T23:08:17Z)
Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting. This paper proposes a simple but effective edge-friendly incremental learning framework. Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z)
Bandit-Driven Batch Selection for Robust Learning under Label Noise [20.202806541218944]
We introduce a novel approach for batch selection in Gradient Descent (SGD) training, leveraging bandit algorithms. Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets.
arXiv Detail & Related papers (2023-10-31T19:19:01Z)
Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem. Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z)
Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z)
Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in Amateur Data in Imitation Learning [3.145455301228175]
Behavioral (BC) bears a high potential for safe and direct transfer of human skills to robots. In order to allow the imitators to effectively learn from imperfect demonstrations, we propose to employ the robust t-momentum optimization algorithm. We show empirically how the algorithm can be used to produce robust BC imitators against datasets with unknown heaviness.
arXiv Detail & Related papers (2021-08-02T04:30:41Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.