D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale
Attention
- URL: http://arxiv.org/abs/2203.00860v1
- Date: Wed, 2 Mar 2022 04:21:12 GMT
- Title: D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale
Attention
- Authors: Junyu Lin, Xiaofeng Mao, Yuefeng Chen, Lei Xu, Yuan He, Hui Xue
- Abstract summary: We propose a decoder-only detector called D2ETR.
In the absence of encoder, the decoder directly attends to the fine-fused feature maps generated by the Transformer backbone.
D2ETR demonstrates low computational complexity and high detection accuracy in evaluations on the COCO benchmark.
- Score: 27.354159713970322
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: DETR is the first fully end-to-end detector that predicts a final set of
predictions without post-processing. However, it suffers from problems such as
low performance and slow convergence. A series of works aim to tackle these
issues in different ways, but the computational cost is yet expensive due to
the sophisticated encoder-decoder architecture. To alleviate this issue, we
propose a decoder-only detector called D^2ETR. In the absence of encoder, the
decoder directly attends to the fine-fused feature maps generated by the
Transformer backbone with a novel computationally efficient cross-scale
attention module. D^2ETR demonstrates low computational complexity and high
detection accuracy in evaluations on the COCO benchmark, outperforming DETR and
its variants.
Related papers
- Cross Resolution Encoding-Decoding For Detection Transformers [33.248031676529635]
Cross-Resolution.
Decoding (CRED) is designed to fuse multiscale.
detection mechanisms.
CRED delivers accuracy similar to the high-resolution DETR counterpart in roughly 50% fewer FLOPs.
We plan to release pretrained CRED-DETRs for use by the community.
arXiv Detail & Related papers (2024-10-05T09:01:59Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Sparse DETR: Efficient End-to-End Object Detection with Learnable
Sparsity [10.098578160958946]
We show that Sparse DETR achieves better performance than Deformable DETR even with only 10% encoder tokens on the COCO dataset.
Albeit only the encoder tokens are sparsified, the total computation cost decreases by 38% and the frames per second (FPS) increases by 42% compared to Deformable DETR.
arXiv Detail & Related papers (2021-11-29T05:22:46Z) - Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.
Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z) - Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module.
Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.