Towards Data-Efficient Detection Transformers
- URL: http://arxiv.org/abs/2203.09507v2
- Date: Mon, 21 Mar 2022 16:25:54 GMT
- Title: Towards Data-Efficient Detection Transformers
- Authors: Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, Dacheng Tao
- Abstract summary: We show most detection transformers suffer from significant performance drops on small-size datasets.
We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR.
We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
- Score: 77.43470797296906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detection Transformers have achieved competitive performance on the
sample-rich COCO dataset. However, we show most of them suffer from significant
performance drops on small-size datasets, like Cityscapes. In other words, the
detection transformers are generally data-hungry. To tackle this problem, we
empirically analyze the factors that affect data efficiency, through a
step-by-step transition from a data-efficient RCNN variant to the
representative DETR. The empirical results suggest that sparse feature sampling
from local image areas holds the key. Based on this observation, we alleviate
the data-hungry issue of existing detection transformers by simply alternating
how key and value sequences are constructed in the cross-attention layer, with
minimum modifications to the original models. Besides, we introduce a simple
yet effective label augmentation method to provide richer supervision and
improve data efficiency. Experiments show that our method can be readily
applied to different detection transformers and improve their performance on
both small-size and sample-rich datasets. Code will be made publicly available
at \url{https://github.com/encounter1997/DE-DETRs}.
Related papers
- Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com [1.6702285371066043]
Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains.
In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection.
Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL.
The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score.
arXiv Detail & Related papers (2024-05-22T14:38:48Z) - Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification [9.82907639745345]
HSI classification methods require high-quality labeled HSIs, which are often costly to obtain.
We propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue.
arXiv Detail & Related papers (2023-09-22T13:39:24Z) - Leveraging the Power of Data Augmentation for Transformer-based Tracking [64.46371987827312]
We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
arXiv Detail & Related papers (2023-09-15T09:18:54Z) - TransFace: Calibrating Transformer Training for Face Recognition from a
Data-Centric Perspective [40.521854111639094]
Vision Transformers (ViTs) have demonstrated powerful representation ability in various visual tasks thanks to their intrinsic data-hungry nature.
However, we unexpectedly find that ViTs perform vulnerably when applied to face recognition (FR) scenarios with extremely large datasets.
This paper proposes a superior FR model called TransFace, which employs a patch-level data augmentation strategy named DPAP and a hard sample mining strategy named EHSM.
arXiv Detail & Related papers (2023-08-20T02:02:16Z) - Remote Sensing Change Detection With Transformers Trained from Scratch [62.96911491252686]
transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.
We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on four public benchmarks.
arXiv Detail & Related papers (2023-04-13T17:57:54Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Iwin: Human-Object Interaction Detection via Transformer with Irregular
Windows [57.00864538284686]
Iwin Transformer is a hierarchical Transformer which progressively performs token representation learning and token agglomeration within irregular windows.
The effectiveness and efficiency of Iwin Transformer are verified on the two standard HOI detection benchmark datasets.
arXiv Detail & Related papers (2022-03-20T12:04:50Z) - Change Detection from Synthetic Aperture Radar Images via Graph-Based
Knowledge Supplement Network [36.41983596642354]
We propose a Graph-based Knowledge Supplement Network (GKSNet) for image change detection.
To be more specific, we extract discriminative information from the existing labeled dataset as additional knowledge.
To validate the proposed method, we conducted extensive experiments on four SAR datasets.
arXiv Detail & Related papers (2022-01-22T02:50:50Z) - Efficient Two-Stage Detection of Human-Object Interactions with a Novel
Unary-Pairwise Transformer [41.44769642537572]
Unary-Pairwise Transformer is a two-stage detector that exploits unary and pairwise representations for HOIs.
We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches.
arXiv Detail & Related papers (2021-12-03T10:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.