Adding Seemingly Uninformative Labels Helps in Low Data Regimes
- URL: http://arxiv.org/abs/2008.00807v2
- Date: Tue, 11 Aug 2020 10:52:43 GMT
- Title: Adding Seemingly Uninformative Labels Helps in Low Data Regimes
- Authors: Christos Matsoukas, Albert Bou I Hernandez, Yue Liu, Karin Dembrower,
Gisele Miranda, Emir Konuk, Johan Fredin Haslum, Athanasios Zouzos, Peter
Lindholm, Fredrik Strand, Kevin Smith
- Abstract summary: We consider a task that requires difficult-to-obtain expert annotations: tumor segmentation in mammography images.
We show that, in low-data settings, performance can be improved by complementing the expert annotations with seemingly uninformative labels from non-expert annotators, turning the task into a multi-class problem.
- Score: 6.953976287091344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evidence suggests that networks trained on large datasets generalize well not
solely because of the numerous training examples, but also class diversity
which encourages learning of enriched features. This raises the question of
whether this remains true when data is scarce - is there an advantage to
learning with additional labels in low-data regimes? In this work, we consider
a task that requires difficult-to-obtain expert annotations: tumor segmentation
in mammography images. We show that, in low-data settings, performance can be
improved by complementing the expert annotations with seemingly uninformative
labels from non-expert annotators, turning the task into a multi-class problem.
We reveal that these gains increase when less expert data is available, and
uncover several interesting properties through further studies. We demonstrate
our findings on CSAW-S, a new dataset that we introduce here, and confirm them
on two public datasets.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Multi-Site Class-Incremental Learning with Weighted Experts in Echocardiography [1.305420351791698]
Building an echocardiography view that maintains performance in real-life cases requires diverse multi-site data.
We propose a class-incremental learning method which learns an expert network for each dataset.
We validate our work on six datasets from multiple sites, demonstrating significant reductions in training time while improving view classification performance.
arXiv Detail & Related papers (2024-07-31T13:05:32Z) - A Self Supervised StyleGAN for Image Annotation and Classification with
Extremely Limited Labels [35.43549147657739]
We propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets.
We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10.
arXiv Detail & Related papers (2023-12-26T09:46:50Z) - From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web [118.67589717634281]
Continual learning often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.
We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation.
Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification.
arXiv Detail & Related papers (2023-11-19T10:43:43Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information.
We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Connecting Images through Time and Sources: Introducing Low-data,
Heterogeneous Instance Retrieval [3.6526118822907594]
We show that it is not trivial to pick features responding well to a panel of variations and semantic content.
Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations.
arXiv Detail & Related papers (2021-03-19T10:54:51Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.