Transformers for Multi-Object Tracking on Point Clouds
- URL: http://arxiv.org/abs/2205.15730v1
- Date: Tue, 31 May 2022 12:20:54 GMT
- Title: Transformers for Multi-Object Tracking on Point Clouds
- Authors: Felicia Ruppel, Florian Faion, Claudius Gl\"aser and Klaus Dietmayer
- Abstract summary: We present TransMOT, a novel transformer-based end-to-end trainable online tracker and detector for point cloud data.
The model utilizes a cross- and a self-attention mechanism and is applicable to lidar data in an automotive context.
Results are presented on the challenging real-world dataset nuScenes, where the proposed model outperforms its Kalman filter-based tracking baseline.
- Score: 9.287964414592826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present TransMOT, a novel transformer-based end-to-end trainable online
tracker and detector for point cloud data. The model utilizes a cross- and a
self-attention mechanism and is applicable to lidar data in an automotive
context, as well as other data types, such as radar. Both track management and
the detection of new tracks are performed by the same transformer decoder
module and the tracker state is encoded in feature space. With this approach,
we make use of the rich latent space of the detector for tracking rather than
relying on low-dimensional bounding boxes. Still, we are able to retain some of
the desirable properties of traditional Kalman-filter based approaches, such as
an ability to handle sensor input at arbitrary timesteps or to compensate frame
skips. This is possible due to a novel module that transforms the track
information from one frame to the next on feature-level and thereby fulfills a
similar task as the prediction step of a Kalman filter. Results are presented
on the challenging real-world dataset nuScenes, where the proposed model
outperforms its Kalman filter-based tracking baseline.
Related papers
- Detection Is Tracking: Point Cloud Multi-Sweep Deep Learning Models Revisited [0.0]
In autonomous driving, lidar measurements are usually passed through a "virtual sensor" realized by a deep learning model.
We argue in this paper that such an input already contains temporal information, and therefore the virtual sensor output should also contain temporal information.
We present a deep learning model called MULti-Sweep PAired Detector (MULSPAD) that produces, for each detected object, a pair of bounding boxes at both the end time and the beginning time of the input buffer.
arXiv Detail & Related papers (2024-02-24T08:07:48Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - TrajectoryFormer: 3D Object Tracking Transformer with Predictive
Trajectory Hypotheses [51.60422927416087]
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots.
We present TrajectoryFormer, a novel point-cloud-based 3D MOT framework.
arXiv Detail & Related papers (2023-06-09T13:31:50Z) - Transformers for Object Detection in Large Point Clouds [9.287964414592826]
We present TransLPC, a novel detection model for large point clouds based on a transformer architecture.
We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries.
This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
arXiv Detail & Related papers (2022-09-30T06:35:43Z) - Bag of Tricks for Domain Adaptive Multi-Object Tracking [4.084199842578325]
The proposed method was built from pre-existing detector and tracker under the tracking-by-detection paradigm.
The tracker we used is an online tracker that merely links newly received detections with existing tracks.
Our method, SIA_Track, takes the first place on MOT Synth2MOT17 track at BMTT 2022 challenge.
arXiv Detail & Related papers (2022-05-31T08:49:20Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture.
We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor.
Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z) - Learning Spatio-Temporal Transformer for Visual Tracking [108.11680070733598]
We present a new tracking architecture with an encoder-decoder transformer as the key component.
The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing.
The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running real-time speed, being 6x faster than Siam R-CNN.
arXiv Detail & Related papers (2021-03-31T15:19:19Z) - Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers.
This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention.
Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z) - Trajectory saliency detection using consistency-oriented latent codes
from a recurrent auto-encoder [0.0]
Trajectories represent the best way to support progressive dynamic saliency detection.
A trajectory will be qualified as salient if it deviates from normal trajectories that share a common motion pattern related to a given context.
We show that our method outperforms existing methods on several scenarios drawn from the publicly available dataset of pedestrian trajectories acquired in a railway station.
arXiv Detail & Related papers (2020-12-17T13:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.