Synchronize Feature Extracting and Matching: A Single Branch Framework
for 3D Object Tracking
- URL: http://arxiv.org/abs/2308.12549v1
- Date: Thu, 24 Aug 2023 04:28:08 GMT
- Title: Synchronize Feature Extracting and Matching: A Single Branch Framework
for 3D Object Tracking
- Authors: Teli Ma, Mengmeng Wang, Jimin Xiao, Huifeng Wu, Yong Liu
- Abstract summary: Siamese network has been a de facto benchmark framework for 3D LiDAR object tracking.
We propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching.
Experiments on two benchmark datasets show that SyncTrack achieves state-of-the-art performance in real-time tracking.
- Score: 34.58431389376807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Siamese network has been a de facto benchmark framework for 3D LiDAR object
tracking with a shared-parametric encoder extracting features from template and
search region, respectively. This paradigm relies heavily on an additional
matching network to model the cross-correlation/similarity of the template and
search region. In this paper, we forsake the conventional Siamese paradigm and
propose a novel single-branch framework, SyncTrack, synchronizing the feature
extracting and matching to avoid forwarding encoder twice for template and
search region as well as introducing extra parameters of matching network. The
synchronization mechanism is based on the dynamic affinity of the Transformer,
and an in-depth analysis of the relevance is provided theoretically. Moreover,
based on the synchronization, we introduce a novel Attentive Points-Sampling
strategy into the Transformer layers (APST), replacing the random/Farthest
Points Sampling (FPS) method with sampling under the supervision of attentive
relations between the template and search region. It implies connecting
point-wise sampling with the feature learning, beneficial to aggregating more
distinctive and geometric features for tracking with sparse points. Extensive
experiments on two benchmark datasets (KITTI and NuScenes) show that SyncTrack
achieves state-of-the-art performance in real-time tracking.
Related papers
- A study on audio synchronous steganography detection and distributed guide inference model based on sliding spectral features and intelligent inference drive [3.5516803380598074]
This paper proposes a detection and distributed guidance reconstruction model based on short video "Yupan" samples released by China's South Sea Fleet on TikTok.<n>The proposed framework validates the effectiveness of sliding spectral features for synchronized steganography detection and builds an inference model for covert communication analysis and tactical guidance simulation on open platforms.
arXiv Detail & Related papers (2025-05-06T05:24:11Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Multi-Correlation Siamese Transformer Network with Dense Connection for
3D Single Object Tracking [14.47355191520578]
Point cloud-based 3D object tracking is an important task in autonomous driving.
It remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data.
We present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage.
arXiv Detail & Related papers (2023-12-18T09:33:49Z) - Unified Single-Stage Transformer Network for Efficient RGB-T Tracking [47.88113335927079]
We propose a single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone.
With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities.
Experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance while maintaining the fastest inference speed 84.2FPS.
arXiv Detail & Related papers (2023-08-26T05:09:57Z) - S.T.A.R.-Track: Latent Motion Models for End-to-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations [10.46571824050325]
Following the tracking-by-attention paradigm, this paper introduces an object-centric, transformer-based framework for tracking in 3D.
Inspired by this, we propose S.T.A.R.-Track, which uses a novel latent motion model (LMM) to adjust object queries to account for changes in viewing direction and lighting conditions directly in the latent space.
arXiv Detail & Related papers (2023-06-30T12:22:41Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds [6.661881950861012]
We propose a novel one-stream network with the strength of the instance-level encoding, which avoids the correlation operations occurring in previous Siamese network.
The proposed method has achieved considerable performance not only for class-specific tracking but also for class-agnostic tracking with less computation and higher efficiency.
arXiv Detail & Related papers (2022-10-16T12:31:59Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.