Siamese DETR
- URL: http://arxiv.org/abs/2303.18144v1
- Date: Fri, 31 Mar 2023 15:29:25 GMT
- Title: Siamese DETR
- Authors: Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao,
Chen Change Loy, Lu Sheng
- Abstract summary: We present Siamese DETR, a self-supervised pretraining approach for the Transformer architecture in DETR.
We consider learning view-invariant and detection-oriented representations simultaneously through two complementary tasks.
The proposed Siamese DETR achieves state-of-the-art transfer performance on COCO and PASCAL VOC detection.
- Score: 87.45960774877798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent self-supervised methods are mainly designed for representation
learning with the base model, e.g., ResNets or ViTs. They cannot be easily
transferred to DETR, with task-specific Transformer modules. In this work, we
present Siamese DETR, a Siamese self-supervised pretraining approach for the
Transformer architecture in DETR. We consider learning view-invariant and
detection-oriented representations simultaneously through two complementary
tasks, i.e., localization and discrimination, in a novel multi-view learning
framework. Two self-supervised pretext tasks are designed: (i) Multi-View
Region Detection aims at learning to localize regions-of-interest between
augmented views of the input, and (ii) Multi-View Semantic Discrimination
attempts to improve object-level discrimination for each region. The proposed
Siamese DETR achieves state-of-the-art transfer performance on COCO and PASCAL
VOC detection using different DETR variants in all setups. Code is available at
https://github.com/Zx55/SiameseDETR.
Related papers
- TransY-Net:Learning Fully Transformer Networks for Change Detection of
Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Representation Separation for Semantic Segmentation with Vision
Transformers [11.431694321563322]
Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic segmentation.
We present an efficient framework of representation separation in local-patch level and global-region level for semantic segmentation with ViTs.
arXiv Detail & Related papers (2022-12-28T09:54:52Z) - An Empirical Study Of Self-supervised Learning Approaches For Object
Detection With Transformers [0.0]
We explore self-supervised methods based on image reconstruction, masked image modeling and jigsaw.
Preliminary experiments in the iSAID dataset demonstrate faster convergence of DETR in the initial epochs in both pretraining and multi-task learning settings.
arXiv Detail & Related papers (2022-05-11T14:39:27Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - Self-Promoted Supervision for Few-Shot Transformer [178.52948452353834]
Self-promoted sUpervisioN (SUN) is a few-shot learning framework for vision transformers (ViTs)
SUN pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token.
Experiments show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.
arXiv Detail & Related papers (2022-03-14T12:53:27Z) - Activation Modulation and Recalibration Scheme for Weakly Supervised
Semantic Segmentation [24.08326440298189]
We propose a novel activation modulation and recalibration scheme for weakly supervised semantic segmentation.
We show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset.
Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.
arXiv Detail & Related papers (2021-12-16T16:26:14Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z) - TransReID: Transformer-based Object Re-Identification [20.02035310635418]
Vision Transformer (ViT) is a pure transformer-based model for the object re-identification (ReID) task.
With several adaptations, a strong baseline ViT-BoT is constructed with ViT as backbone.
We propose a pure-transformer framework dubbed as TransReID, which is the first work to use a pure Transformer for ReID research.
arXiv Detail & Related papers (2021-02-08T17:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.