DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions
- URL: http://arxiv.org/abs/2309.04682v1
- Date: Sat, 9 Sep 2023 04:40:01 GMT
- Title: DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions
- Authors: Teng Fu, Xiaocong Wang, Haiyang Yu, Ke Niu, Bin Li, Xiangyang Xue
- Abstract summary: We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking.
Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture.
We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
- Score: 52.63323657077447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple object tracking (MOT) tends to become more challenging when severe
occlusions occur. In this paper, we analyze the limitations of traditional
Convolutional Neural Network-based methods and Transformer-based methods in
handling occlusions and propose DNMOT, an end-to-end trainable DeNoising
Transformer for MOT. To address the challenge of occlusions, we explicitly
simulate the scenarios when occlusions occur. Specifically, we augment the
trajectory with noises during training and make our model learn the denoising
process in an encoder-decoder architecture, so that our model can exhibit
strong robustness and perform well under crowded scenes. Additionally, we
propose a Cascaded Mask strategy to better coordinate the interaction between
different types of queries in the decoder to prevent the mutual suppression
between neighboring trajectories under crowded scenes. Notably, the proposed
method requires no additional modules like matching strategy and motion state
estimation in inference. We conduct extensive experiments on the MOT17, MOT20,
and DanceTrack datasets, and the experimental results show that our method
outperforms previous state-of-the-art methods by a clear margin.
Related papers
- ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics.
Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities.
We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z) - DiffusionTrack: Diffusion Model For Multi-Object Tracking [15.025051933538043]
Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames.
Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) methods and one-stage joint detection and tracking (JDT) methods.
We propose a simple but robust framework that formulates object detection and association jointly as a consistent denoising diffusion process.
arXiv Detail & Related papers (2023-08-19T04:48:41Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Disentangling Object Motion and Occlusion for Unsupervised Multi-frame
Monocular Depth [37.021579239596164]
Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level.
We propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels.
Our method, called DynamicDepth, is a new framework trained via a self-supervised cycle consistent learning scheme.
arXiv Detail & Related papers (2022-03-29T01:36:11Z) - Robust Unsupervised Multi-Object Tracking in Noisy Environments [5.409476600348953]
We introduce a robust unsupervised multi-object tracking (MOT) model: AttU-Net.
The proposed single-head attention model helps limit the negative impact of noise by learning visual representations at different segment scales.
We evaluate our method in the MNIST and the Atari game video benchmark.
arXiv Detail & Related papers (2021-05-20T19:38:03Z) - Learning to Generate Noise for Multi-Attack Robustness [126.23656251512762]
Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations.
In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system.
We propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks.
arXiv Detail & Related papers (2020-06-22T10:44:05Z) - Perturbing Across the Feature Hierarchy to Improve Standard and Strict
Blackbox Attack Transferability [100.91186458516941]
We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers.
We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance.
We analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.
arXiv Detail & Related papers (2020-04-29T16:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.