TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking
- URL: http://arxiv.org/abs/2504.03258v1
- Date: Fri, 04 Apr 2025 08:18:48 GMT
- Title: TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking
- Authors: Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, Bin Yang,
- Abstract summary: Existing approaches integrate query denoising within the tracking-by-attention paradigm.<n>We propose TQD-Track, which introduces Temporal Query Denoising (TQD) tailored for MOT.<n>We analyze our proposed TQD for different tracking paradigms, and find out the paradigm with explicit learned data association module.
- Score: 13.004539088540188
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Query denoising has become a standard training strategy for DETR-based detectors by addressing the slow convergence issue. Besides that, query denoising can be used to increase the diversity of training samples for modeling complex scenarios which is critical for Multi-Object Tracking (MOT), showing its potential in MOT application. Existing approaches integrate query denoising within the tracking-by-attention paradigm. However, as the denoising process only happens within the single frame, it cannot benefit the tracker to learn temporal-related information. In addition, the attention mask in query denoising prevents information exchange between denoising and object queries, limiting its potential in improving association using self-attention. To address these issues, we propose TQD-Track, which introduces Temporal Query Denoising (TQD) tailored for MOT, enabling denoising queries to carry temporal information and instance-specific feature representation. We introduce diverse noise types onto denoising queries that simulate real-world challenges in MOT. We analyze our proposed TQD for different tracking paradigms, and find out the paradigm with explicit learned data association module, e.g. tracking-by-detection or alternating detection and association, benefit from TQD by a larger margin. For these paradigms, we further design an association mask in the association module to ensure the consistent interaction between track and detection queries as during inference. Extensive experiments on the nuScenes dataset demonstrate that our approach consistently enhances different tracking methods by only changing the training process, especially the paradigms with explicit association module.
Related papers
- DeTrack: In-model Latent Denoising Learning for Visual Object Tracking [24.993508502786998]
We propose a new paradigm to formulate the visual object tracking problem as a denoising learning process.<n>Inspired by the diffusion model, denoising learning enhances the model's robustness to unseen data.<n>We introduce noise to bounding boxes, generating noisy boxes for training, thus enhancing model robustness on testing data.
arXiv Detail & Related papers (2025-01-05T07:28:50Z) - Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - Fast Window-Based Event Denoising with Spatiotemporal Correlation
Enhancement [85.66867277156089]
We propose window-based event denoising, which simultaneously deals with a stack of events.
In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise.
Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.
arXiv Detail & Related papers (2024-02-14T15:56:42Z) - iKUN: Speak to Trackers without Retraining [21.555469501789577]
We propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers.
To improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise.
We also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions.
arXiv Detail & Related papers (2023-12-25T11:48:55Z) - DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions [52.63323657077447]
We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking.
Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture.
We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2023-09-09T04:40:01Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - DINF: Dynamic Instance Noise Filter for Occluded Pedestrian Detection [0.0]
RCNN-based pedestrian detectors use rectangle regions to extract instance features.
The number of severely overlapping objects and the number of slightly overlapping objects are unbalanced.
An iterable dynamic instance noise filter (DINF) is proposed for the RCNN-based pedestrian detectors to improve the signal-noise ratio of the instance feature.
arXiv Detail & Related papers (2023-01-13T14:12:36Z) - Robust Semantic Communications with Masked VQ-VAE Enabled Codebook [56.63571713657059]
We propose a framework for the robust end-to-end semantic communication systems to combat the semantic noise.
To combat the semantic noise, the adversarial training with weight is developed to incorporate the samples with semantic noise in the training dataset.
We develop a feature importance module (FIM) to suppress the noise-related and task-unrelated features.
arXiv Detail & Related papers (2022-06-08T16:58:47Z) - A Free Lunch to Person Re-identification: Learning from Automatically
Generated Noisy Tracklets [52.30547023041587]
unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets.
But their performance is still far lower than the supervised counterparts.
In this paper, we propose to tackle this problem by learning re-ID models from automatically generated person tracklets.
arXiv Detail & Related papers (2022-04-02T16:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.