Learning with Out-of-Distribution Data for Audio Classification
- URL: http://arxiv.org/abs/2002.04683v1
- Date: Tue, 11 Feb 2020 21:08:06 GMT
- Title: Learning with Out-of-Distribution Data for Audio Classification
- Authors: Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
- Abstract summary: We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
- Score: 60.48251022280506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In supervised machine learning, the assumption that training data is labelled
correctly is not always satisfied. In this paper, we investigate an instance of
labelling error for classification tasks in which the dataset is corrupted with
out-of-distribution (OOD) instances: data that does not belong to any of the
target classes, but is labelled as such. We show that detecting and relabelling
certain OOD instances, rather than discarding them, can have a positive effect
on learning. The proposed method uses an auxiliary classifier, trained on data
that is known to be in-distribution, for detection and relabelling. The amount
of data required for this is shown to be small. Experiments are carried out on
the FSDnoisy18k audio dataset, where OOD instances are very prevalent. The
proposed method is shown to improve the performance of convolutional neural
networks by a significant margin. Comparisons with other noise-robust
techniques are similarly encouraging.
Related papers
- Safe Semi-Supervised Contrastive Learning Using In-Distribution Data as Positive Examples [3.4546761246181696]
We propose a self-supervised contrastive learning approach to fully exploit a large amount of unlabeled data.
The results show that self-supervised contrastive learning significantly improves classification accuracy.
arXiv Detail & Related papers (2024-08-03T22:33:13Z) - An accurate detection is not all you need to combat label noise in web-noisy datasets [23.020126612431746]
We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples.
We propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach.
arXiv Detail & Related papers (2024-07-08T00:21:42Z) - A noisy elephant in the room: Is your out-of-distribution detector robust to label noise? [49.88894124047644]
We take a closer look at 20 state-of-the-art OOD detection methods.
We show that poor separation between incorrectly classified ID samples vs. OOD samples is an overlooked yet important limitation of existing methods.
arXiv Detail & Related papers (2024-04-02T09:40:22Z) - How Does Unlabeled Data Provably Help Out-of-Distribution Detection? [63.41681272937562]
Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data.
This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
arXiv Detail & Related papers (2024-02-05T20:36:33Z) - Are labels informative in semi-supervised learning? -- Estimating and
leveraging the missing-data mechanism [4.675583319625962]
Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models.
It can be affected by the presence of informative'' labels, which occur when some classes are more likely to be labeled than others.
We propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm.
arXiv Detail & Related papers (2023-02-15T09:18:46Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.