Towards End-to-end Semi-supervised Learning for One-stage Object
Detection
- URL: http://arxiv.org/abs/2302.11299v1
- Date: Wed, 22 Feb 2023 11:35:40 GMT
- Title: Towards End-to-end Semi-supervised Learning for One-stage Object
Detection
- Authors: Gen Luo, Yiyi Zhou, Lei Jin, Xiaoshuai Sun, Rongrong Ji
- Abstract summary: This paper focuses on the semi-supervised learning for the advanced and popular one-stage detection network YOLOv5.
We propose a novel teacher-student learning recipe called OneTeacher with two innovative designs, namely Multi-view Pseudo-label Refinement (MPR) and Decoupled Semi-supervised Optimization (DSO)
In particular, MPR improves the quality of pseudo-labels via augmented-view refinement and global-view filtering, and DSO handles the joint optimization conflicts via structure tweaks and task-specific pseudo-labeling.
- Score: 88.56917845580594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semi-supervised object detection (SSOD) is a research hot spot in computer
vision, which can greatly reduce the requirement for expensive bounding-box
annotations. Despite great success, existing progress mainly focuses on
two-stage detection networks like FasterRCNN, while the research on one-stage
detectors is often ignored. In this paper, we focus on the semi-supervised
learning for the advanced and popular one-stage detection network YOLOv5.
Compared with Faster-RCNN, the implementation of YOLOv5 is much more complex,
and the various training techniques used in YOLOv5 can also reduce the benefit
of SSOD. In addition to this challenge, we also reveal two key issues in
one-stage SSOD, which are low-quality pseudo-labeling and multi-task
optimization conflict, respectively. To address these issues, we propose a
novel teacher-student learning recipe called OneTeacher with two innovative
designs, namely Multi-view Pseudo-label Refinement (MPR) and Decoupled
Semi-supervised Optimization (DSO). In particular, MPR improves the quality of
pseudo-labels via augmented-view refinement and global-view filtering, and DSO
handles the joint optimization conflicts via structure tweaks and task-specific
pseudo-labeling. In addition, we also carefully revise the implementation of
YOLOv5 to maximize the benefits of SSOD, which is also shared with the existing
SSOD methods for fair comparison. To validate OneTeacher, we conduct extensive
experiments on COCO and Pascal VOC. The extensive experiments show that
OneTeacher can not only achieve superior performance than the compared methods,
e.g., 15.0% relative AP gains over Unbiased Teacher, but also well handle the
key issues in one-stage SSOD. Our source code is available at:
https://github.com/luogen1996/OneTeacher.
Related papers
- HR-Pro: Point-supervised Temporal Action Localization via Hierarchical
Reliability Propagation [40.52832708232682]
Point-supervised Temporal Action Localization (PSTAL) is an emerging research direction for label-efficient learning.
We propose a Hierarchical Reliability Propagation framework, which consists of two reliability-aware stages: Snippet-level Discrimination Learning and Instance-level Completeness Learning.
Our HR-Pro achieves state-of-the-art performance on multiple challenging benchmarks, including an impressive average mAP of 60.3% on THUMOS14.
arXiv Detail & Related papers (2023-08-24T07:19:11Z) - Active Teacher for Semi-Supervised Object Detection [80.10937030195228]
We propose a novel algorithm called Active Teacher for semi-supervised object detection (SSOD)
Active Teacher extends the teacher-student framework to an iterative version, where the label set is partially and gradually augmented by evaluating three key factors of unlabeled examples.
With this design, Active Teacher can maximize the effect of limited label information while improving the quality of pseudo-labels.
arXiv Detail & Related papers (2023-03-15T03:59:27Z) - Efficient Teacher: Semi-Supervised Object Detection for YOLOv5 [2.2290171169275492]
One-stage anchor-based detectors lack the structure to generate high-quality or flexible pseudo labels.
Dense Detector is a baseline model that extends RetinaNet with dense sampling techniques inspired by YOLOv5.
Pseudo Label Assigner makes more refined use of pseudo labels from Dense Detector.
Epoch Adaptor is a method that enables a stable and efficient end-to-end semi-supervised training schedule.
arXiv Detail & Related papers (2023-02-15T10:40:19Z) - Towards Discriminative and Transferable One-Stage Few-Shot Object
Detectors [3.9189402702217344]
Few-shot object detection (FSOD) aims to address this problem by learning novel classes given only a few samples.
We make the observation that the large gap in performance between two-stage and one-stage FSODs are mainly due to their weak discriminability.
To address these limitations, we propose the Few-shot RetinaNet (FSRN) that consists of: a multi-way support training strategy to augment the number of foreground samples for dense meta-detectors.
arXiv Detail & Related papers (2022-10-11T20:58:25Z) - A Weakly Supervised Learning Framework for Salient Object Detection via
Hybrid Labels [96.56299163691979]
This paper focuses on a new weakly-supervised salient object detection (SOD) task under hybrid labels.
To address the issues of label noise and quantity imbalance in this task, we design a new pipeline framework with three sophisticated training strategies.
Experiments on five SOD benchmarks show that our method achieves competitive performance against weakly-supervised/unsupervised methods.
arXiv Detail & Related papers (2022-09-07T06:45:39Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - Self-Promoted Supervision for Few-Shot Transformer [178.52948452353834]
Self-promoted sUpervisioN (SUN) is a few-shot learning framework for vision transformers (ViTs)
SUN pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token.
Experiments show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.
arXiv Detail & Related papers (2022-03-14T12:53:27Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.