Related papers: Object Tracking through Residual and Dense LSTMs

Object Tracking through Residual and Dense LSTMs

URL: http://arxiv.org/abs/2006.12061v1
Date: Mon, 22 Jun 2020 08:20:17 GMT
Title: Object Tracking through Residual and Dense LSTMs
Authors: Fabio Garcea and Alessandro Cucco and Lia Morra and Fabrizio Lamberti
Abstract summary: Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative. DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
Score: 67.98948222599849
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual object tracking task is constantly gaining importance in several fields of application as traffic monitoring, robotics, and surveillance, to name a few. Dealing with changes in the appearance of the tracked object is paramount to achieve high tracking accuracy, and is usually achieved by continually learning features. Recently, deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative, bypassing the need to retrain the feature extraction in an online fashion. Inspired by the success of residual and dense networks in image recognition, we propose here to enhance the capabilities of hybrid trackers using residual and/or dense LSTMs. By introducing skip connections, it is possible to increase the depth of the architecture while ensuring a fast convergence. Experimental results on the Re3 tracker show that DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances such as occlusions and out-of-view objects. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.

Related papers

Deep Learning and Hybrid Approaches for Dynamic Scene Analysis, Object Detection and Motion Tracking [0.0]
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities. It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier.
arXiv Detail & Related papers (2024-12-05T07:44:40Z)
Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks. Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z)
PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search [64.28335667655129]
Multiple object tracking is a critical task in autonomous driving. As tracking accuracy improves, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy.
arXiv Detail & Related papers (2024-03-23T04:18:49Z)
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module. LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation. Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z)
Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs [1.7969777786551424]
Spiking Neural Networks (SNNs) have emerged as an attractive-temporal computing paradigm vision for complex tasks. We propose an optimized spiking long short-term memory networks (LSTM) training framework that involves a novel. rev-to-SNN conversion framework, followed by SNN training. We evaluate our framework on sequential learning tasks including temporal M, Google Speech Commands (GSC) datasets, and UCI Smartphone on different LSTM architectures.
arXiv Detail & Related papers (2022-10-23T04:10:27Z)
Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme. Our network deeply embeds cross-image feature correlation in multiple layers of the feature network. Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z)
Network Level Spatial Temporal Traffic State Forecasting with Hierarchical Attention LSTM (HierAttnLSTM) [0.0]
This paper leverages diverse traffic state datasets from the Caltrans Performance Measurement System (PeMS) hosted on the open benchmark. We integrate cell and hidden states from low-level to high-level Long Short-Term Memory (LSTM) networks with an attention pooling mechanism. The developed hierarchical structure is designed to account for dependencies across different time scales, capturing the spatial-temporal correlations of network-level traffic states.
arXiv Detail & Related papers (2022-01-15T05:25:03Z)
MOTS R-CNN: Cosine-margin-triplet loss for multi-object tracking [2.8935588665357077]
One of the central tasks of multi-object tracking involves learning a distance metric consistent with the semantic similarities of objects. In this paper, we propose cosine-margin-contrastive (CMC) and cosine-margin-triplet (CMT) loss by reformulating both contrastive and triplet loss functions. We then propose the MOTS R-CNN framework for joint multi-object tracking and segmentation, particularly targeted at improving the tracking performance.
arXiv Detail & Related papers (2021-02-06T05:03:29Z)
A journey in ESN and LSTM visualisations on a language task [77.34726150561087]
We trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task. The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space.
arXiv Detail & Related papers (2020-12-03T08:32:01Z)
Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation [31.842910084312265]
We propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations. Experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed.
arXiv Detail & Related papers (2020-02-07T03:06:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.