Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets
- URL: http://arxiv.org/abs/2104.12690v1
- Date: Mon, 26 Apr 2021 16:29:32 GMT
- Title: Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets
- Authors: Yuan-Hong Liao, Amlan Kar, Sanja Fidler
- Abstract summary: We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
- Score: 90.61266099147053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data is the engine of modern computer vision, which necessitates collecting
large-scale datasets. This is expensive, and guaranteeing the quality of the
labels is a major challenge. In this paper, we investigate efficient annotation
strategies for collecting multi-class classification labels for a large
collection of images. While methods that exploit learnt models for labeling
exist, a surprisingly prevalent approach is to query humans for a fixed number
of labels per datum and aggregate them, which is expensive. Building on prior
work on online joint probabilistic modeling of human annotations and
machine-generated beliefs, we propose modifications and best practices aimed at
minimizing human labeling effort. Specifically, we make use of advances in
self-supervised learning, view annotation as a semi-supervised learning
problem, identify and mitigate pitfalls and ablate several key design choices
to propose effective guidelines for labeling. Our analysis is done in a more
realistic simulation that involves querying human labelers, which uncovers
issues with evaluation using existing worker simulation methods. Simulated
experiments on a 125k image subset of the ImageNet100 show that it can be
annotated to 80% top-1 accuracy with 0.35 annotations per image on average, a
2.7x and 6.7x improvement over prior work and manual annotation, respectively.
Project page: https://fidler-lab.github.io/efficient-annotation-cookbook
Related papers
- One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Label Selection Approach to Learning from Crowds [25.894399244406287]
Learning from Crowds is a framework which directly trains the models using noisy labeled data from crowd workers.
We propose a novel Learning from Crowds model, inspired by SelectiveNet proposed for the selective prediction problem.
A major advantage of the proposed method is that it can be applied to almost all variants of supervised learning problems.
arXiv Detail & Related papers (2023-08-21T00:22:32Z) - Estimating label quality and errors in semantic segmentation data via
any model [19.84626033109009]
We study methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled.
This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset.
arXiv Detail & Related papers (2023-07-11T07:29:09Z) - Improving Model Training via Self-learned Label Representations [5.969349640156469]
We show that more sophisticated label representations are better for classification than the usual one-hot encoding.
We propose Learning with Adaptive Labels (LwAL) algorithm, which simultaneously learns the label representation while training for the classification task.
Our algorithm introduces negligible additional parameters and has a minimal computational overhead.
arXiv Detail & Related papers (2022-09-09T21:10:43Z) - Semantic Segmentation with Active Semi-Supervised Learning [23.79742108127707]
We propose a novel algorithm that combines active learning and semi-supervised learning.
Our method obtains over 95% of the network's performance on the full-training set.
arXiv Detail & Related papers (2022-03-21T04:16:25Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Big Self-Supervised Models are Strong Semi-Supervised Learners [116.00752519907725]
We show that it is surprisingly effective for semi-supervised learning on ImageNet.
A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning.
We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network.
arXiv Detail & Related papers (2020-06-17T17:48:22Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.