Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial
System Applications
- URL: http://arxiv.org/abs/2110.02044v1
- Date: Tue, 5 Oct 2021 13:50:38 GMT
- Title: Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial
System Applications
- Authors: Wanlin Xie, Jaime Ide, Daniel Izadi, Sean Banger, Thayne Walker, Ryan
Ceresani, Dylan Spagnuolo, Christopher Guagliano, Henry Diaz, Jason Twedt
- Abstract summary: Multi-object tracking (MOT) is a crucial component of situational awareness in military defense applications.
We present a robust object tracking architecture aimed to accommodate for the noise in real-time situations.
We propose a kinematic prediction model, called Deep Extended Kalman Filter (DeepEKF), in which a sequence-to-sequence architecture is used to predict entity trajectories in latent space.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-object tracking (MOT) is a crucial component of situational awareness
in military defense applications. With the growing use of unmanned aerial
systems (UASs), MOT methods for aerial surveillance is in high demand.
Application of MOT in UAS presents specific challenges such as moving sensor,
changing zoom levels, dynamic background, illumination changes, obscurations
and small objects. In this work, we present a robust object tracking
architecture aimed to accommodate for the noise in real-time situations. We
propose a kinematic prediction model, called Deep Extended Kalman Filter
(DeepEKF), in which a sequence-to-sequence architecture is used to predict
entity trajectories in latent space. DeepEKF utilizes a learned image embedding
along with an attention mechanism trained to weight the importance of areas in
an image to predict future states. For the visual scoring, we experiment with
different similarity measures to calculate distance based on entity
appearances, including a convolutional neural network (CNN) encoder,
pre-trained using Siamese networks. In initial evaluation experiments, we show
that our method, combining scoring structure of the kinematic and visual models
within a MHT framework, has improved performance especially in edge cases where
entity motion is unpredictable, or the data presents frames with significant
gaps.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception.
Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - PhyOT: Physics-informed object tracking in surveillance cameras [0.2633434651741688]
We consider the case of object tracking, and evaluate a hybrid model (PhyOT) that conceptualizes deep neural networks as sensors''
Our experiments combine three neural networks, performing position, indirect velocity and acceleration estimation, respectively, and evaluate such a formulation on two benchmark datasets.
Results suggest that our PhyOT can track objects in extreme conditions that the state-of-the-art deep neural networks fail.
arXiv Detail & Related papers (2023-12-14T04:15:55Z) - Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth
Estimation in Dynamic Scenes [19.810725397641406]
We propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly.
Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation.
Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior.
arXiv Detail & Related papers (2023-01-14T09:43:23Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - Space Non-cooperative Object Active Tracking with Deep Reinforcement
Learning [1.212848031108815]
We propose an end-to-end active visual tracking method based on DQN algorithm, named as DRLAVT.
It can guide the chasing spacecraft approach to arbitrary space non-cooperative target merely relied on color or RGBD images.
It significantly outperforms position-based visual servoing baseline algorithm that adopts state-of-the-art 2D monocular tracker, SiamRPN.
arXiv Detail & Related papers (2021-12-18T06:12:24Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.