Stream Query Denoising for Vectorized HD Map Construction
- URL: http://arxiv.org/abs/2401.09112v2
- Date: Thu, 18 Jan 2024 03:19:53 GMT
- Title: Stream Query Denoising for Vectorized HD Map Construction
- Authors: Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai
Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao
- Abstract summary: This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.
The methodology involves denoising the queries that have been perturbed by the addition of noise to the ground-truth information from the preceding frame.
This denoising process aims to reconstruct the ground-truth information for the current frame, thereby simulating the prediction process inherent in stream queries.
- Score: 32.91824536697469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To enhance perception performance in complex and extensive scenarios within
the realm of autonomous driving, there has been a noteworthy focus on temporal
modeling, with a particular emphasis on streaming methods. The prevailing trend
in streaming models involves the utilization of stream queries for the
propagation of temporal information. Despite the prevalence of this approach,
the direct application of the streaming paradigm to the construction of
vectorized high-definition maps (HD-maps) fails to fully harness the inherent
potential of temporal information. This paper introduces the Stream Query
Denoising (SQD) strategy as a novel approach for temporal modeling in
high-definition map (HD-map) construction. SQD is designed to facilitate the
learning of temporal consistency among map elements within the streaming model.
The methodology involves denoising the queries that have been perturbed by the
addition of noise to the ground-truth information from the preceding frame.
This denoising process aims to reconstruct the ground-truth information for the
current frame, thereby simulating the prediction process inherent in stream
queries. The SQD strategy can be applied to those streaming methods (e.g.,
StreamMapNet) to enhance the temporal modeling. The proposed SQD-MapNet is the
StreamMapNet equipped with SQD. Extensive experiments on nuScenes and
Argoverse2 show that our method is remarkably superior to other existing
methods across all settings of close range and long range. The code will be
available soon.
Related papers
- Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction [23.493813870675197]
Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities.
Current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene.
We introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors.
arXiv Detail & Related papers (2024-07-15T11:48:57Z) - StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map
Construction [36.1596833523566]
We present StreamMapNet, a novel online mapping pipeline adept at long-sequence temporal modeling of videos.
StreamMapNet employs multi-point attention and temporal information which empowers the construction of large-range local HD maps with high stability.
arXiv Detail & Related papers (2023-08-24T05:22:43Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - A Fast and Map-Free Model for Trajectory Prediction in Traffics [2.435517936694533]
This paper proposes an efficient trajectory prediction model that is not dependent on traffic maps.
By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer, our model is able to learn rich dynamic and interaction information of all agents.
Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset.
arXiv Detail & Related papers (2023-07-19T08:36:31Z) - Boundary-Denoising for Video Activity Localization [57.9973253014712]
We study the video activity localization problem from a denoising perspective.
Specifically, we propose an encoder-decoder model named DenoiseLoc.
Experiments show that DenoiseLoc advances %in several video activity understanding tasks.
arXiv Detail & Related papers (2023-04-06T08:48:01Z) - DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion [137.8749239614528]
We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD.
Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
arXiv Detail & Related papers (2023-03-27T00:40:52Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models [57.20432226304683]
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
We propose a novel end-to-end streaming NAR speech recognition system.
We show that the proposed method improves online ASR recognition in low latency conditions.
arXiv Detail & Related papers (2021-07-20T11:42:26Z) - Progressive Temporal Feature Alignment Network for Video Inpainting [51.26380898255555]
Video convolution aims to fill in-temporal "corrupted regions" with plausible content.
Current methods achieve this goal through attention, flow-based warping, or 3D temporal convolution.
We propose 'Progressive Temporal Feature Alignment Network', which progressively enriches features extracted from the current frame with the warped feature from neighbouring frames.
arXiv Detail & Related papers (2021-04-08T04:50:33Z) - A Plug-and-play Scheme to Adapt Image Saliency Deep Model for Video Data [54.198279280967185]
This paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data.
Our method is simple yet effective for adapting any off-the-shelf pre-trained image saliency deep model to obtain high-quality video saliency detection.
arXiv Detail & Related papers (2020-08-02T13:23:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.