MOT FCG++: Enhanced Representation of Spatio-temporal Motion and Appearance Features
- URL: http://arxiv.org/abs/2411.10028v2
- Date: Thu, 21 Nov 2024 07:06:53 GMT
- Title: MOT FCG++: Enhanced Representation of Spatio-temporal Motion and Appearance Features
- Authors: Yanzhao Fang,
- Abstract summary: We propose a novel approach for appearance and spatial-temporal motion feature representation, improving upon the hierarchical clustering method MOT FCG.
For spatialtemporal motion features, we first propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects.
For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global.
- Score: 0.0
- License:
- Abstract: The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial-temporal motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial-temporal motion feature representation, improving upon the hierarchical clustering association method MOT FCG. For spatialtemporal motion features, we first propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. Second, Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on target motion state estimation. For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global. Based on the baseline model MOT FCG, we have realized further improvements in the performance of all. we achieved 63.1 HOTA, 76.9 MOTA and 78.2 IDF1 on the MOT17 test set, and also achieved competitive performance on the MOT20 and DanceTrack sets.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences [25.74000325019015]
We introduce a novel LiDAR 3D object detection framework, namely LiSTM, to facilitate spatial-temporal feature learning with cross-frame motion forecasting information.
We have conducted experiments on the aggregation and nuScenes datasets to demonstrate that the proposed framework achieves superior 3D detection performance.
arXiv Detail & Related papers (2024-09-06T16:29:04Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Rt-Track: Robust Tricks for Multi-Pedestrian Tracking [4.271127739716044]
We propose a novel direction consistency method for smooth trajectory prediction (STP-DC) to increase the modeling of motion information.
We also propose a hyper-grain feature embedding network (HG-FEN) to enhance the modeling of appearance models.
To achieve state-of-the-art performance in MOT, we propose a robust tracker named Rt-track, incorporating various tricks and techniques.
arXiv Detail & Related papers (2023-03-16T22:08:29Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Hierarchical Feature Alignment Network for Unsupervised Video Object
Segmentation [99.70336991366403]
We propose a concise, practical, and efficient architecture for appearance and motion feature alignment.
The proposed HFAN reaches a new state-of-the-art performance on DAVIS-16, achieving 88.7 $mathcalJ&mathcalF$ Mean, i.e., a relative improvement of 3.5% over the best published result.
arXiv Detail & Related papers (2022-07-18T10:10:14Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.