Noise-Tolerant Learning for Audio-Visual Action Recognition
- URL: http://arxiv.org/abs/2205.07611v3
- Date: Mon, 11 Sep 2023 04:23:25 GMT
- Title: Noise-Tolerant Learning for Audio-Visual Action Recognition
- Authors: Haochen Han, Qinghua Zheng, Minnan Luo, Kaiyao Miao, Feng Tian and Yan
Chen
- Abstract summary: Video datasets are usually coarse-annotated or collected from the Internet.
We propose a noise-tolerant learning framework to find anti-interference model parameters against both noisy labels and noisy correspondence.
Our method significantly improves the robustness of the action recognition model and surpasses the baselines by a clear margin.
- Score: 31.641972732424463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, video recognition is emerging with the help of multi-modal
learning, which focuses on integrating distinct modalities to improve the
performance or robustness of the model. Although various multi-modal learning
methods have been proposed and offer remarkable recognition results, almost all
of these methods rely on high-quality manual annotations and assume that
modalities among multi-modal data provide semantically relevant information.
Unfortunately, the widely used video datasets are usually coarse-annotated or
collected from the Internet. Thus, it inevitably contains a portion of noisy
labels and noisy correspondence. To address this challenge, we use the
audio-visual action recognition task as a proxy and propose a noise-tolerant
learning framework to find anti-interference model parameters against both
noisy labels and noisy correspondence. Specifically, our method consists of two
phases that aim to rectify noise by the inherent correlation between
modalities. First, a noise-tolerant contrastive training phase is performed to
make the model immune to the possible noisy-labeled data. To alleviate the
influence of noisy correspondence, we propose a cross-modal noise estimation
component to adjust the consistency between different modalities. As the noisy
correspondence existed at the instance level, we further propose a
category-level contrastive loss to reduce its interference. Second, in the
hybrid-supervised training phase, we calculate the distance metric among
features to obtain corrected labels, which are used as complementary
supervision to guide the training. Extensive experiments on a wide range of
noisy levels demonstrate that our method significantly improves the robustness
of the action recognition model and surpasses the baselines by a clear margin.
Related papers
- Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - Relation Modeling and Distillation for Learning with Noisy Labels [4.556974104115929]
This paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning.
The proposed framework can learn discriminative representations for noisy data, which results in superior performance than the existing methods.
arXiv Detail & Related papers (2024-05-30T01:47:27Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Label Noise-Robust Learning using a Confidence-Based Sieving Strategy [15.997774467236352]
In learning tasks with label noise, improving model robustness against overfitting is a pivotal challenge.
Identifying the samples with noisy labels and preventing the model from learning them is a promising approach to address this challenge.
We propose a novel discriminator metric called confidence error and a sieving strategy called CONFES to differentiate between the clean and noisy samples effectively.
arXiv Detail & Related papers (2022-10-11T10:47:28Z) - Robust Contrastive Learning against Noisy Views [79.71880076439297]
We propose a new contrastive loss function that is robust against noisy views.
We show that our approach provides consistent improvements over the state-of-the-art image, video, and graph contrastive learning benchmarks.
arXiv Detail & Related papers (2022-01-12T05:24:29Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Multi-Objective Interpolation Training for Robustness to Label Noise [17.264550056296915]
We show that standard supervised contrastive learning degrades in the presence of label noise.
We propose a novel label noise detection method that exploits the robust feature representations learned via contrastive learning.
Experiments on synthetic and real-world noise benchmarks demonstrate that MOIT/MOIT+ achieves state-of-the-art results.
arXiv Detail & Related papers (2020-12-08T15:01:54Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.