Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning
Framework for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2207.04448v1
- Date: Sun, 10 Jul 2022 12:07:25 GMT
- Title: Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning
Framework for Monocular 3D Object Detection
- Authors: Lei Yang, Xinyu Zhang, Li Wang, Minghan Zhu, Chuang Zhang, Jun Li
- Abstract summary: Mix-Teaching is an effective semi-supervised learning framework applicable to employ both labeled and unlabeled images in training stage.
Mix-Teaching consistently improves MonoFlex and GUPNet by significant margins under various labeling ratios on KITTI dataset.
- Score: 22.074959519526605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection is an essential perception task for autonomous
driving. However, the high reliance on large-scale labeled data make it costly
and time-consuming during model optimization. To reduce such over-reliance on
human annotations, we propose Mix-Teaching, an effective semi-supervised
learning framework applicable to employ both labeled and unlabeled images in
training stage. Mix-Teaching first generates pseudo-labels for unlabeled images
by self-training. The student model is then trained on the mixed images
possessing much more intensive and precise labeling by merging instance-level
image patches into empty backgrounds or labeled images. This is the first to
break the image-level limitation and put high-quality pseudo labels from multi
frames into one image for semi-supervised training. Besides, as a result of the
misalignment between confidence score and localization quality, it's hard to
discriminate high-quality pseudo-labels from noisy predictions using only
confidence-based criterion. To that end, we further introduce an
uncertainty-based filter to help select reliable pseudo boxes for the above
mixing operation. To the best of our knowledge, this is the first unified SSL
framework for monocular 3D object detection. Mix-Teaching consistently improves
MonoFlex and GUPNet by significant margins under various labeling ratios on
KITTI dataset. For example, our method achieves around +6.34% AP@0.7
improvement against the GUPNet baseline on validation set when using only 10%
labeled data. Besides, by leveraging full training set and the additional 48K
raw images of KITTI, it can further improve the MonoFlex by +4.65% improvement
on AP@0.7 for car detection, reaching 18.54% AP@0.7, which ranks the 1st place
among all monocular based methods on KITTI test leaderboard. The code and
pretrained models will be released at
https://github.com/yanglei18/Mix-Teaching.
Related papers
- Adaptive Mix for Semi-Supervised Medical Image Segmentation [22.69909762038458]
We propose an Adaptive Mix algorithm (AdaMix) for image mix-up in a self-paced learning manner.
We develop three frameworks with our AdaMix, i.e., AdaMix-ST, AdaMix-MT, and AdaMix-CT, for semi-supervised medical image segmentation.
arXiv Detail & Related papers (2024-07-31T13:19:39Z) - Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few
Labels [47.15381781274115]
We propose a simple yet effective training strategy called dual pseudo training (DPT)
DPT operates in three stages: training a classifier on partially labeled data to predict pseudo-labels; training a conditional generative model using these pseudo-labels to generate pseudo images.
With one or two labels per class, DPT achieves a Fr'echet Inception Distance (FID) score of 3.08 or 2.52 on ImageNet 256x256.
arXiv Detail & Related papers (2023-02-21T10:24:53Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - mc-BEiT: Multi-choice Discretization for Image BERT Pre-training [52.04866462439979]
Image BERT pre-training with masked image modeling (MIM) is a popular practice to cope with self-supervised representation learning.
We introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives.
arXiv Detail & Related papers (2022-03-29T09:08:18Z) - Mixed Supervision Learning for Whole Slide Image Classification [88.31842052998319]
We propose a mixed supervision learning framework for super high-resolution images.
During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning.
A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives.
arXiv Detail & Related papers (2021-07-02T09:46:06Z) - Re-labeling ImageNet: from Single to Multi-Labels, from Global to
Localized Labels [34.13899937264952]
ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise.
Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark.
We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied.
arXiv Detail & Related papers (2021-01-13T11:55:58Z) - 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object
Detection [76.42897462051067]
3DIoUMatch is a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes.
We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels.
Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios.
arXiv Detail & Related papers (2020-12-08T11:06:26Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z) - FixMatch: Simplifying Semi-Supervised Learning with Consistency and
Confidence [93.91751021370638]
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.
In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling.
Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images.
arXiv Detail & Related papers (2020-01-21T18:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.