RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
- URL: http://arxiv.org/abs/2407.17140v1
- Date: Wed, 24 Jul 2024 10:20:19 GMT
- Title: RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
- Authors: Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu,
- Abstract summary: RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR.
To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales.
To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator.
- Score: 2.1186155813156926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance without loss of speed. Source code and pre-trained models will be available at https://github.com/lyuwenyu/RT-DETR.
Related papers
- ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation [12.511829774226113]
We propose an ultra-lightweight (1M) visual-inertial odometry (VIO) network capable of test-time adaptation (TTA) based on visual-inertial consistency.
It achieves 36X smaller network size than state-of-the-art with a minute increase in error -- 1% on the KITTI dataset.
arXiv Detail & Related papers (2024-09-19T22:24:14Z) - RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision [7.721101317599364]
We propose a hierarchical dense positive supervision method based on RT-DETR, named RT-DETRv3.
To address insufficient decoder training, we propose a novel learning strategy involving self-attention perturbation.
RT-DETRv3 significantly outperforms existing real-time detectors, including the RT-DETR series and the YOLO series.
arXiv Detail & Related papers (2024-09-13T02:02:07Z) - Cascaded Temporal Updating Network for Efficient Video Super-Resolution [47.63267159007611]
Key components in recurrent-based VSR networks significantly impact model efficiency.
We propose a cascaded temporal updating network (CTUN) for efficient VSR.
CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods.
arXiv Detail & Related papers (2024-08-26T12:59:32Z) - LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection [63.780355815743135]
We present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection.
The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder.
arXiv Detail & Related papers (2024-06-05T17:07:24Z) - VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA [2.8595179027282907]
Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications.
We develop a lightweight ViT model that can be trained directly on small datasets without any pre-training.
We evaluate our proposed model, that we call VTR (ViT for SAR ATR) on three widely used SAR datasets.
arXiv Detail & Related papers (2024-04-06T06:49:55Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF [6.135925201075925]
We propose the dynamic PlenOctree DOT, which adaptively refines the sample distribution to adjust to changing scene complexity.
Compared with POT, our DOT outperforms it by enhancing visual quality, reducing over $55.15$/$68.84%$ parameters, and providing 1.7/1.9 times FPS for NeRF-synthetic and Tanks $&$ Temples, respectively.
arXiv Detail & Related papers (2023-07-28T06:21:42Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.