Self-Supervised Object Detection via Generative Image Synthesis
- URL: http://arxiv.org/abs/2110.09848v1
- Date: Tue, 19 Oct 2021 11:04:05 GMT
- Title: Self-Supervised Object Detection via Generative Image Synthesis
- Authors: Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar
Iqbal, Sifei Liu, Thu Nguyen-Phuoc, Carsten Rother, Jan Kautz
- Abstract summary: We present the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection.
We use collections of real world images without bounding box annotations to learn to synthesize and detect objects.
Our work advances the field of self-supervised object detection by introducing a successful new paradigm of using controllable GAN-based image synthesis for it.
- Score: 106.65384648377349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present SSOD, the first end-to-end analysis-by synthesis framework with
controllable GANs for the task of self-supervised object detection. We use
collections of real world images without bounding box annotations to learn to
synthesize and detect objects. We leverage controllable GANs to synthesize
images with pre-defined object properties and use them to train object
detectors. We propose a tight end-to-end coupling of the synthesis and
detection networks to optimally train our system. Finally, we also propose a
method to optimally adapt SSOD to an intended target data without requiring
labels for it. For the task of car detection, on the challenging KITTI and
Cityscapes datasets, we show that SSOD outperforms the prior state-of-the-art
purely image-based self-supervised object detection method Wetectron. Even
without requiring any 3D CAD assets, it also surpasses the state-of-the-art
rendering based method Meta-Sim2. Our work advances the field of
self-supervised object detection by introducing a successful new paradigm of
using controllable GAN-based image synthesis for it and by significantly
improving the baseline accuracy of the task. We open-source our code at
https://github.com/NVlabs/SSOD.
Related papers
- Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Learning Remote Sensing Object Detection with Single Point Supervision [17.12725535531483]
Pointly Supervised Object Detection (PSOD) has attracted considerable interests due to its lower labeling cost as compared to box-level supervised object detection.
We make the first attempt to achieve RS object detection with single point supervision, and propose a PSOD method tailored for RS images.
Our method can achieve significantly better performance as compared to state-of-the-art image-level and point-level supervised detection methods, and reduce the performance gap between PSOD and box-level supervised object detection.
arXiv Detail & Related papers (2023-05-23T15:06:04Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Label-Free Synthetic Pretraining of Object Detectors [67.17371526567325]
We propose a new approach, Synthetic optimized layout with Instance Detection (SOLID), to pretrain object detectors with synthetic images.
Our "SOLID" approach consists of two main components: (1) generating synthetic images using a collection of unlabelled 3D models with optimized scene arrangement; (2) pretraining an object detector on "instance detection" task.
Our approach does not need any semantic labels for pretraining and allows the use of arbitrary, diverse 3D models.
arXiv Detail & Related papers (2022-08-08T16:55:17Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.