OccLinker: Deflickering Occupancy Networks through Lightweight Spatio-Temporal Correlation
- URL: http://arxiv.org/abs/2502.15438v2
- Date: Mon, 10 Mar 2025 13:57:27 GMT
- Title: OccLinker: Deflickering Occupancy Networks through Lightweight Spatio-Temporal Correlation
- Authors: Fengcheng Yu, Haoran Xu, Canming Xia, Ziyang Zong, Guang Tan,
- Abstract summary: Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving.<n>Recent approaches have incorporated historical data to mitigate the issue, but they often incur high computational costs and introduce noisy information that interferes with object detection.<n>We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance.
- Score: 15.726401007342087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving. However, existing methods often suffer from temporal inconsistencies, manifesting as flickering effects that compromise visual experience and adversely affect decision-making. While recent approaches have incorporated historical data to mitigate the issue, they often incur high computational costs and introduce noisy information that interferes with object detection. We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance. Our method employs a three-stage architecture that consolidates historical static and motion cues, correlates them with current features through a Motion-Static Integration (MSI) mechanism, and generates correction occupancy to refine base network predictions. Extensive experiments on two benchmarks demonstrate the efficiency and effectiveness of our method, outperforming the latest baseline models. The source code are available in the supplementary material.
Related papers
- Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction [12.064509280163502]
3D occupancy prediction has emerged as a key perception task for autonomous driving.
Recent studies focus on integrating information obtained from past observations to improve prediction accuracy.
We propose StreamOcc, a framework that aggregates past-temporal information in a stream-based manner.
Experiments on the Occ3D-nus dataset show that StreamOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by more than 50% compared to previous methods.
arXiv Detail & Related papers (2025-03-28T02:05:53Z) - FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving.<n>Recent methods leverage the range-view representation to improve processing efficiency.<n>We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.
We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting [18.933451243989452]
Existing 3D occupancy forecasting approaches struggle to predict plausible spatial details for movable objects.
We propose a novel vision-based paradigm to explicitly tackle the bias and achieve both effective 3D OCF.
We develop an efficient multi-head network network EfficientOCF to achieve 3D OCF with our devisedtemporally decoupled representation.
arXiv Detail & Related papers (2024-11-21T14:27:15Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Distribution Discrepancy and Feature Heterogeneity for Active 3D Object Detection [18.285299184361598]
LiDAR-based 3D object detection is a critical technology for the development of autonomous driving and robotics.
We propose a novel and effective active learning (AL) method called Distribution Discrepancy and Feature Heterogeneity (DDFH)
It simultaneously considers geometric features and model embeddings, assessing information from both the instance-level and frame-level perspectives.
arXiv Detail & Related papers (2024-09-09T08:26:11Z) - Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation [7.199090922071512]
We propose the Local Split Attention Pooling (LSAP) mechanism to effectively expand the receptive field through a series of local split operations.
We put forth a novel framework, designated as LSNet, for large-scale point cloud semantic segmentation.
LSNet demonstrated superior performance compared to state-of-the-art semantic segmentation networks on three benchmark datasets.
arXiv Detail & Related papers (2024-09-03T07:10:20Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - ELA: Efficient Local Attention for Deep Convolutional Neural Networks [15.976475674061287]
This paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure.
To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques.
ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab.
arXiv Detail & Related papers (2024-03-02T08:06:18Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Practical Collaborative Perception: A Framework for Asynchronous and
Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods.
State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach.
We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental
Study [2.6205925938720833]
State of the art methods use deep neural networks to predict semantic classes for each point in a LiDAR scan.
A powerful and efficient way to process LiDAR measurements is to use two-dimensional, image-like projections.
We demonstrate various techniques to boost the performance and to improve runtime as well as memory constraints.
arXiv Detail & Related papers (2020-04-06T11:08:12Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.