Self-supervised Semi-supervised Learning for Data Labeling and Quality
Evaluation
- URL: http://arxiv.org/abs/2111.10932v1
- Date: Mon, 22 Nov 2021 00:59:00 GMT
- Title: Self-supervised Semi-supervised Learning for Data Labeling and Quality
Evaluation
- Authors: Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan
- Abstract summary: We tackle the problems of efficient data labeling and annotation verification under the human-in-the-loop setting.
We propose a unifying framework by leveraging self-supervised semi-supervised learning and use it to construct for data labeling and verification tasks.
- Score: 10.483508279350195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the adoption of deep learning techniques in industrial applications grows
with increasing speed and scale, successful deployment of deep learning models
often hinges on the availability, volume, and quality of annotated data. In
this paper, we tackle the problems of efficient data labeling and annotation
verification under the human-in-the-loop setting. We showcase that the latest
advancements in the field of self-supervised visual representation learning can
lead to tools and methods that benefit the curation and engineering of natural
image datasets, reducing annotation cost and increasing annotation quality. We
propose a unifying framework by leveraging self-supervised semi-supervised
learning and use it to construct workflows for data labeling and annotation
verification tasks. We demonstrate the effectiveness of our workflows over
existing methodologies. On active learning task, our method achieves 97.0%
Top-1 Accuracy on CIFAR10 with 0.1% annotated data, and 83.9% Top-1 Accuracy on
CIFAR100 with 10% annotated data. When learning with 50% of wrong labels, our
method achieves 97.4% Top-1 Accuracy on CIFAR10 and 85.5% Top-1 Accuracy on
CIFAR100.
Related papers
- ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models [0.9237437350215897]
We curated a large-scale dataset of 13,389 resumes from diverse sources.
We employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification.
Our results demonstrate significant improvements over traditional machine learning approaches.
arXiv Detail & Related papers (2024-06-26T07:25:18Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Class-Aware Contrastive Semi-Supervised Learning [51.205844705156046]
We propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL) to improve pseudo-label quality and enhance the model's robustness in the real-world setting.
Our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10.
arXiv Detail & Related papers (2022-03-04T12:18:23Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - Instant-Teaching: An End-to-End Semi-Supervised Object Detection
Framework [14.914115746675176]
Semi-supervised object detection can leverage unlabeled data to improve the model performance.
We propose Instant-Teaching, which uses instant pseudo labeling with extended weak-strong data augmentations for teaching during each training iteration.
Our method surpasses state-of-the-art methods by 4.2 mAP on MS-COCO when using $2%$ labeled data.
arXiv Detail & Related papers (2021-03-21T14:03:36Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.