A Study on the Impact of Data Augmentation for Training Convolutional
Neural Networks in the Presence of Noisy Labels
- URL: http://arxiv.org/abs/2208.11176v3
- Date: Mon, 7 Aug 2023 11:36:16 GMT
- Title: A Study on the Impact of Data Augmentation for Training Convolutional
Neural Networks in the Presence of Noisy Labels
- Authors: Emeson Santana, Gustavo Carneiro, Filipe R. Cordeiro
- Abstract summary: Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks.
We evaluate the impact of data augmentation as a design choice for training deep neural networks.
We show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise.
- Score: 14.998309259808236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Label noise is common in large real-world datasets, and its presence harms
the training process of deep neural networks. Although several works have
focused on the training strategies to address this problem, there are few
studies that evaluate the impact of data augmentation as a design choice for
training deep neural networks. In this work, we analyse the model robustness
when using different data augmentations and their improvement on the training
with the presence of noisy labels. We evaluate state-of-the-art and classical
data augmentation strategies with different levels of synthetic noise for the
datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We
evaluate the methods using the accuracy metric. Results show that the
appropriate selection of data augmentation can drastically improve the model
robustness to label noise, increasing up to 177.84% of relative best test
accuracy compared to the baseline with no augmentation, and an increase of up
to 6% in absolute value with the state-of-the-art DivideMix training strategy.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Analyze the Robustness of Classifiers under Label Noise [5.708964539699851]
Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance.
This research focuses on the increasingly pertinent issue of label noise's impact on practical applications.
arXiv Detail & Related papers (2023-12-12T13:51:25Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS)
Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Dynamic Loss For Robust Learning [17.33444812274523]
This work presents a novel meta-learning based dynamic loss that automatically adjusts the objective functions with the training process to robustly learn a classifier from long-tailed noisy data.
Our method achieves state-of-the-art accuracy on multiple real-world and synthetic datasets with various types of data biases, including CIFAR-10/100, Animal-10N, ImageNet-LT, and Webvision.
arXiv Detail & Related papers (2022-11-22T01:48:25Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Synergistic Network Learning and Label Correction for Noise-robust Image
Classification [28.27739181560233]
Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice.
We propose a robust label correction framework combining the ideas of small loss selection and noise correction.
We demonstrate our method on both synthetic and real-world datasets with different noise types and rates.
arXiv Detail & Related papers (2022-02-27T23:06:31Z) - Augmentation Strategies for Learning with Noisy Labels [3.698228929379249]
We evaluate different augmentation strategies for algorithms tackling the "learning with noisy labels" problem.
We find that using one set of augmentations for loss modeling tasks and another set for learning is the most effective.
We introduce this augmentation strategy to the state-of-the-art technique and demonstrate that we can improve performance across all evaluated noise levels.
arXiv Detail & Related papers (2021-03-03T02:19:35Z) - Dataset Condensation with Differentiable Siamese Augmentation [30.571335208276246]
We focus on condensing large training sets into significantly smaller synthetic sets which can be used to train deep neural networks.
We propose Differentiable Siamese Augmentation that enables effective use of data augmentation to synthesize more informative synthetic images.
We show with only less than 1% data that our method achieves 99.6%, 94.9%, 88.5%, 71.5% relative performance on MNIST, FashionMNIST, SVHN, CIFAR10 respectively.
arXiv Detail & Related papers (2021-02-16T16:32:21Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.