SiamMo: Siamese Motion-Centric 3D Object Tracking
- URL: http://arxiv.org/abs/2408.01688v2
- Date: Mon, 9 Sep 2024 05:24:27 GMT
- Title: SiamMo: Siamese Motion-Centric 3D Object Tracking
- Authors: Yuxiang Yang, Yingqi Deng, Jing Zhang, Hongjie Gu, Zhekang Dong,
- Abstract summary: We introduce SiamMo, a novel and simple motion-centric tracking approach.
Unlike the traditional single-stream architecture, we introduce Siamese feature extraction for motion-centric tracking.
SiamMo sets a new record on the KITTI tracking benchmark with 90.1% precision while maintaining a high inference speed of 108 FPS.
- Score: 12.68616041331354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current 3D single object tracking methods primarily rely on the Siamese matching-based paradigm, which struggles with textureless and incomplete LiDAR point clouds. Conversely, the motion-centric paradigm avoids appearance matching, thus overcoming these issues. However, its complex multi-stage pipeline and the limited temporal modeling capability of a single-stream architecture constrain its potential. In this paper, we introduce SiamMo, a novel and simple Siamese motion-centric tracking approach. Unlike the traditional single-stream architecture, we employ Siamese feature extraction for motion-centric tracking. This decouples feature extraction from temporal fusion, significantly enhancing tracking performance. Additionally, we design a Spatio-Temporal Feature Aggregation module to integrate Siamese features at multiple scales, capturing motion information effectively. We also introduce a Box-aware Feature Encoding module to encode object size priors into motion estimation. SiamMo is a purely motion-centric tracker that eliminates the need for additional processes like segmentation and box refinement. Without whistles and bells, SiamMo not only surpasses state-of-the-art methods across multiple benchmarks but also demonstrates exceptional robustness in challenging scenarios. SiamMo sets a new record on the KITTI tracking benchmark with 90.1\% precision while maintaining a high inference speed of 108 FPS. The code will be released at https://github.com/HDU-VRLab/SiamMo.
Related papers
- Motion-to-Matching: A Mixed Paradigm for 3D Single Object Tracking [27.805298263103495]
We propose MTM-Tracker, which combines motion modeling with feature matching into a single network.
In the first stage, we exploit the continuous historical boxes as motion prior and propose an encoder-decoder structure to locate target coarsely.
In the second stage, we introduce a feature interaction module to extract motion-aware features from consecutive point clouds and match them to refine target movement.
arXiv Detail & Related papers (2023-08-23T02:40:51Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - An Effective Motion-Centric Paradigm for 3D Single Object Tracking in
Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving.
Current approaches all follow the Siamese paradigm based on appearance matching.
We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single
Object Tracking in Point Clouds [39.41305358466479]
3D single object tracking in LiDAR point clouds plays a crucial role in autonomous driving.
Current approaches all follow the Siamese paradigm based on appearance matching.
We introduce a motion-centric paradigm to handle 3D SOT from a new perspective.
arXiv Detail & Related papers (2022-03-03T14:20:10Z) - SiamMOT: Siamese Multi-Object Tracking [28.97401838563374]
We introduce a region-based Siamese Multi-Object Tracking network, which we name SiamMOT.
SiamMOT includes a motion model that estimates the instance's movement between two frames such that detected instances are associated.
SiamMOT is efficient, and it runs at 17 FPS for 720P videos on a single modern GPU.
arXiv Detail & Related papers (2021-05-25T01:09:26Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - MAT: Motion-Aware Multi-Object Tracking [9.098793914779161]
In this paper, we propose Motion-Aware Tracker (MAT), focusing more on various motion patterns of different objects.
Experiments on MOT16 and MOT17 challenging benchmarks demonstrate that our MAT approach can achieve the superior performance by a large margin.
arXiv Detail & Related papers (2020-09-10T11:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.