Robust Unsupervised Multi-Object Tracking in Noisy Environments
- URL: http://arxiv.org/abs/2105.10005v1
- Date: Thu, 20 May 2021 19:38:03 GMT
- Title: Robust Unsupervised Multi-Object Tracking in Noisy Environments
- Authors: C.-H. Huck Yang, Mohit Chhabra, Y.-C. Liu, Quan Kong, Tomoaki
Yoshinaga, Tomokazu Murakam
- Abstract summary: We introduce a robust unsupervised multi-object tracking (MOT) model: AttU-Net.
The proposed single-head attention model helps limit the negative impact of noise by learning visual representations at different segment scales.
We evaluate our method in the MNIST and the Atari game video benchmark.
- Score: 5.409476600348953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Camera movement and unpredictable environmental conditions like dust and wind
induce noise into video feeds. We observe that popular unsupervised MOT methods
are dependent on noise-free conditions. We show that the addition of a small
amount of artificial random noise causes a sharp degradation in model
performance on benchmark metrics. We resolve this problem by introducing a
robust unsupervised multi-object tracking (MOT) model: AttU-Net. The proposed
single-head attention model helps limit the negative impact of noise by
learning visual representations at different segment scales. AttU-Net shows
better unsupervised MOT tracking performance over variational inference-based
state-of-the-art baselines. We evaluate our method in the MNIST and the Atari
game video benchmark. We also provide two extended video datasets consisting of
complex visual patterns that include Kuzushiji characters and fashion images to
validate the effectiveness of the proposed method.
Related papers
- MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection [15.72443573134312]
We treat feature vectors extracted from videos as realizations of a random variable with a fixed distribution.
We train our video anomaly detector using a modification of denoising score matching.
Our experiments on five popular video anomaly detection benchmarks demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2024-03-21T15:46:19Z) - Robust Tiny Object Detection in Aerial Images amidst Label Noise [50.257696872021164]
This study addresses the issue of tiny object detection under noisy label supervision.
We propose a DeNoising Tiny Object Detector (DN-TOD), which incorporates a Class-aware Label Correction scheme.
Our method can be seamlessly integrated into both one-stage and two-stage object detection pipelines.
arXiv Detail & Related papers (2024-01-16T02:14:33Z) - DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions [52.63323657077447]
We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking.
Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture.
We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2023-09-09T04:40:01Z) - No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention
and Zoom-in Boundary Detection [52.03562682785128]
Temporal video grounding aims to retrieve the time interval of a language query from an untrimmed video.
A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR.
We propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection.
arXiv Detail & Related papers (2023-07-20T04:12:10Z) - DINF: Dynamic Instance Noise Filter for Occluded Pedestrian Detection [0.0]
RCNN-based pedestrian detectors use rectangle regions to extract instance features.
The number of severely overlapping objects and the number of slightly overlapping objects are unbalanced.
An iterable dynamic instance noise filter (DINF) is proposed for the RCNN-based pedestrian detectors to improve the signal-noise ratio of the instance feature.
arXiv Detail & Related papers (2023-01-13T14:12:36Z) - MANet: Improving Video Denoising with a Multi-Alignment Network [72.93429911044903]
We present a multi-alignment network, which generates multiple flow proposals followed by attention-based averaging.
Experiments on a large-scale video dataset demonstrate that our method improves the denoising baseline model by 0.2dB.
arXiv Detail & Related papers (2022-02-20T00:52:07Z) - Speech Prediction in Silent Videos using Variational Autoencoders [29.423462898526605]
We present a model for generating speech in a silent video.
The proposed model combines recurrent neural networks and variational deep generative models to learn the auditory's conditional distribution.
We demonstrate the performance of our model on the GRID dataset based on standard benchmarks.
arXiv Detail & Related papers (2020-11-14T17:09:03Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.