CorrDiff: Adaptive Delay-aware Detector with Temporal Cue Inputs for Real-time Object Detection
- URL: http://arxiv.org/abs/2501.05132v1
- Date: Thu, 09 Jan 2025 10:34:25 GMT
- Title: CorrDiff: Adaptive Delay-aware Detector with Temporal Cue Inputs for Real-time Object Detection
- Authors: Xiang Zhang, Chenchen Fu, Yufei Cui, Lan Yi, Yuyang Sun, Weiwei Wu, Xue Liu,
- Abstract summary: CorrDiff is designed to tackle the challenge of delays in real-time detection systems.
It is able to utilize runtime-estimated temporal cues to predict objects' locations for multiple future frames.
It meets the stringent real-time processing requirements on all kinds of devices.
- Score: 11.714072240331518
- License:
- Abstract: Real-time object detection takes an essential part in the decision-making process of numerous real-world applications, including collision avoidance and path planning in autonomous driving systems. This paper presents a novel real-time streaming perception method named CorrDiff, designed to tackle the challenge of delays in real-time detection systems. The main contribution of CorrDiff lies in its adaptive delay-aware detector, which is able to utilize runtime-estimated temporal cues to predict objects' locations for multiple future frames, and selectively produce predictions that matches real-world time, effectively compensating for any communication and computational delays. The proposed model outperforms current state-of-the-art methods by leveraging motion estimation and feature enhancement, both for 1) single-frame detection for the current frame or the next frame, in terms of the metric mAP, and 2) the prediction for (multiple) future frame(s), in terms of the metric sAP (The sAP metric is to evaluate object detection algorithms in streaming scenarios, factoring in both latency and accuracy). It demonstrates robust performance across a range of devices, from powerful Tesla V100 to modest RTX 2080Ti, achieving the highest level of perceptual accuracy on all platforms. Unlike most state-of-the-art methods that struggle to complete computation within a single frame on less powerful devices, CorrDiff meets the stringent real-time processing requirements on all kinds of devices. The experimental results emphasize the system's adaptability and its potential to significantly improve the safety and reliability for many real-world systems, such as autonomous driving. Our code is completely open-sourced and is available at https://anonymous.4open.science/r/CorrDiff.
Related papers
- Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving [3.617580194719686]
This paper introduces Fast-COS, a novel single-stage object detection framework crafted specifically for driving scenes.
RAViT achieves 81.4% Top-1 accuracy on the ImageNet-1K dataset.
It surpasses leading models in efficiency, delivering up to 75.9% faster GPU inference and 1.38 higher throughput on edge devices.
arXiv Detail & Related papers (2025-02-11T09:54:09Z) - Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency [58.719310295870024]
This paper presents an event-based framework for tracking any point.
It tackles the challenges posed by spatial sparsity and motion sensitivity in events.
It achieves 150% faster processing with competitive model parameters.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception [18.403242474776764]
This work presents an innovative real-time streaming perception method, Transtreaming, which addresses the challenge of real-time object detection with dynamic computational delay.
The proposed model outperforms the existing state-of-the-art methods, even in single-frame detection scenarios.
Transtreaming meets the stringent real-time processing requirements on all kinds of devices.
arXiv Detail & Related papers (2024-09-10T15:26:38Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection
in Autonomous Driving [0.0]
Environmental perception is a key element of autonomous driving.
In this paper, we investigate the best trade-off between detection quality and latency for real-time perception in autonomous vehicles.
We show that models with adequate compression can be run in real-time on the cloud while outperforming local detection performance.
arXiv Detail & Related papers (2023-08-09T21:39:10Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - ACDnet: An action detection network for real-time edge computing based
on flow-guided feature approximation and memory aggregation [8.013823319651395]
ACDnet is a compact action detection network targeting real-time edge computing.
It exploits the temporal coherence between successive video frames to approximate CNN features rather than naively extracting them.
It can robustly achieve detection well above real-time (75 FPS)
arXiv Detail & Related papers (2021-02-26T14:06:31Z) - RMOPP: Robust Multi-Objective Post-Processing for Effective Object
Detection [0.0]
RMOPP is a statistically driven, post-processing algorithm that allows for simultaneous optimization of precision and recall.
We provide a compelling test case on YOLOv2 using the MS-COCO dataset.
arXiv Detail & Related papers (2021-02-09T00:02:38Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.