CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection
- URL: http://arxiv.org/abs/2505.23317v1
- Date: Thu, 29 May 2025 10:23:37 GMT
- Title: CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection
- Authors: Woojin Shin, Donghwa Kang, Byeongyun Park, Brent Byunghoon Kang, Jinkyu Lee, Hyeongboo Baek,
- Abstract summary: CF-DETR is an integrated system featuring a novel coarse-to-fine Transformer architecture and a dedicated real-time scheduling framework NPFP**.<n>It partitions each DETR task into a safety-critical coarse subtask for guaranteed critical object detection within its deadline, and an optional fine subtask for enhanced overall accuracy (R2)<n>Under an NPFP** policy, CF-DETR successfully meets strict timing guarantees for critical operations and achieves significantly higher overall and critical object detection accuracy compared to existing baselines across diverse AV workloads.
- Score: 3.002784388635573
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Detection Transformers (DETR) are increasingly adopted in autonomous vehicle (AV) perception systems due to their superior accuracy over convolutional networks. However, concurrently executing multiple DETR tasks presents significant challenges in meeting firm real-time deadlines (R1) and high accuracy requirements (R2), particularly for safety-critical objects, while navigating the inherent latency-accuracy trade-off under resource constraints. Existing real-time DNN scheduling approaches often treat models generically, failing to leverage Transformer-specific properties for efficient resource allocation. To address these challenges, we propose CF-DETR, an integrated system featuring a novel coarse-to-fine Transformer architecture and a dedicated real-time scheduling framework NPFP**. CF-DETR employs three key strategies (A1: coarse-to-fine inference, A2: selective fine inference, A3: multi-level batch inference) that exploit Transformer properties to dynamically adjust patch granularity and attention scope based on object criticality, aiming to satisfy R2. The NPFP** scheduling framework (A4) orchestrates these adaptive mechanisms A1-A3. It partitions each DETR task into a safety-critical coarse subtask for guaranteed critical object detection within its deadline (ensuring R1), and an optional fine subtask for enhanced overall accuracy (R2), while managing individual and batched execution. Our extensive evaluations on server, GPU-enabled embedded platforms, and actual AV platforms demonstrate that CF-DETR, under an NPFP** policy, successfully meets strict timing guarantees for critical operations and achieves significantly higher overall and critical object detection accuracy compared to existing baselines across diverse AV workloads.
Related papers
- MRC-DETR: An Adaptive Multi-Residual Coupled Transformer for Bare Board PCB Defect Detection [11.16242420187823]
We propose MRC-DETR, a novel and efficient detection framework for bare PCB defect inspection.<n>To enhance feature representation capability, we design a Multi-Residual Directional Coupled Block (MRDCB)<n>To reduce computational redundancy caused by inefficient cross-layer information fusion, we introduce an Adaptive Screening Pyramid Network (ASPN)
arXiv Detail & Related papers (2025-07-04T08:42:38Z) - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - Fractional Correspondence Framework in Detection Transformer [13.388933240897492]
The Detection Transformer (DETR) has significantly simplified the matching process in object detection tasks.<n>This algorithm facilitates optimal one-to-one matching of predicted bounding boxes to ground-truth annotations during training.<n>We propose a flexible matching strategy that captures the cost of aligning predictions with ground truths to find the most accurate correspondences.
arXiv Detail & Related papers (2025-03-06T05:29:20Z) - Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation [33.75630514826721]
We propose a distribution-aware tuning ( DAT) method to make semantic segmentation CTTA efficient and practical in real-world applications.
DAT adaptively selects and updates two small groups of trainable parameters based on data distribution during the continual adaptation process.
We conduct experiments on two widely-used semantic segmentation CTTA benchmarks, achieving promising performance compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2023-09-24T10:48:20Z) - RegFormer: An Efficient Projection-Aware Transformer Network for
Large-Scale Point Cloud Registration [73.69415797389195]
We propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment.
Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers.
Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes.
arXiv Detail & Related papers (2023-03-22T08:47:37Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Miti-DETR: Object Detection based on Transformers with Mitigatory
Self-Attention Convergence [17.854940064699985]
We propose a transformer architecture with a mitigatory self-attention mechanism.
Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in attention propagation.
Miti-DETR significantly enhances the average detection precision and convergence speed towards existing DETR-based models.
arXiv Detail & Related papers (2021-12-26T03:23:59Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - Rethinking Transformer-based Set Prediction for Object Detection [57.7208561353529]
Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.
arXiv Detail & Related papers (2020-11-21T21:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.