Improving Noise Robustness through Abstractions and its Impact on Machine Learning
- URL: http://arxiv.org/abs/2406.08428v1
- Date: Wed, 12 Jun 2024 17:14:44 GMT
- Title: Improving Noise Robustness through Abstractions and its Impact on Machine Learning
- Authors: Alfredo Ibias, Karol Capala, Varun Ravi Varma, Anna Drozdz, Jose Sousa,
- Abstract summary: Noise is a fundamental problem in learning theory with huge effects in the application of Machine Learning (ML) methods.
In this paper, we propose a method to deal with noise: mitigating its effect through the use of data abstractions.
The goal is to reduce the effect of noise over the model's performance through the loss of information produced by the abstraction.
- Score: 2.6563873893593826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Noise is a fundamental problem in learning theory with huge effects in the application of Machine Learning (ML) methods, due to real world data tendency to be noisy. Additionally, introduction of malicious noise can make ML methods fail critically, as is the case with adversarial attacks. Thus, finding and developing alternatives to improve robustness to noise is a fundamental problem in ML. In this paper, we propose a method to deal with noise: mitigating its effect through the use of data abstractions. The goal is to reduce the effect of noise over the model's performance through the loss of information produced by the abstraction. However, this information loss comes with a cost: it can result in an accuracy reduction due to the missing information. First, we explored multiple methodologies to create abstractions, using the training dataset, for the specific case of numerical data and binary classification tasks. We also tested how these abstractions can affect robustness to noise with several experiments that explore the robustness of an Artificial Neural Network to noise when trained using raw data \emph{vs} when trained using abstracted data. The results clearly show that using abstractions is a viable approach for developing noise robust ML methods.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding the Effect of Noise in LLM Training Data with Algorithmic
Chains of Thought [0.0]
We study how noise in chain of thought impacts task performance in highly-controlled setting.
We define two types of noise: textitstatic noise, a local form of noise which is applied after the CoT trace is computed, and textitdynamic noise, a global form of noise which propagates errors in the trace as it is computed.
We find fine-tuned models are extremely robust to high levels of static noise but struggle significantly more with lower levels of dynamic noise.
arXiv Detail & Related papers (2024-02-06T13:59:56Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning.
We propose a novel algorithm that leverages noise stability to estimate data uncertainty.
Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z) - Robust Unlearnable Examples: Protecting Data Against Adversarial
Learning [77.6015932710068]
We propose to make data unlearnable for deep learning models by adding a type of error-minimizing noise.
In this paper, we design new methods to generate robust unlearnable examples that are protected from adversarial training.
Experiments show that the unlearnability brought by robust error-minimizing noise can effectively protect data from adversarial training in various scenarios.
arXiv Detail & Related papers (2022-03-28T07:13:51Z) - Learning based signal detection for MIMO systems with unknown noise
statistics [84.02122699723536]
This paper aims to devise a generalized maximum likelihood (ML) estimator to robustly detect signals with unknown noise statistics.
In practice, there is little or even no statistical knowledge on the system noise, which in many cases is non-Gaussian, impulsive and not analyzable.
Our framework is driven by an unsupervised learning approach, where only the noise samples are required.
arXiv Detail & Related papers (2021-01-21T04:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.