Related papers: Transformers for Multi-Object Tracking on Point Clouds

Transformers for Multi-Object Tracking on Point Clouds

URL: http://arxiv.org/abs/2205.15730v1
Date: Tue, 31 May 2022 12:20:54 GMT
Title: Transformers for Multi-Object Tracking on Point Clouds
Authors: Felicia Ruppel, Florian Faion, Claudius Gl\"aser and Klaus Dietmayer
Abstract summary: We present TransMOT, a novel transformer-based end-to-end trainable online tracker and detector for point cloud data. The model utilizes a cross- and a self-attention mechanism and is applicable to lidar data in an automotive context. Results are presented on the challenging real-world dataset nuScenes, where the proposed model outperforms its Kalman filter-based tracking baseline.
Score: 9.287964414592826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present TransMOT, a novel transformer-based end-to-end trainable online tracker and detector for point cloud data. The model utilizes a cross- and a self-attention mechanism and is applicable to lidar data in an automotive context, as well as other data types, such as radar. Both track management and the detection of new tracks are performed by the same transformer decoder module and the tracker state is encoded in feature space. With this approach, we make use of the rich latent space of the detector for tracking rather than relying on low-dimensional bounding boxes. Still, we are able to retain some of the desirable properties of traditional Kalman-filter based approaches, such as an ability to handle sensor input at arbitrary timesteps or to compensate frame skips. This is possible due to a novel module that transforms the track information from one frame to the next on feature-level and thereby fulfills a similar task as the prediction step of a Kalman filter. Results are presented on the challenging real-world dataset nuScenes, where the proposed model outperforms its Kalman filter-based tracking baseline.

Related papers

LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking [4.69726714177332]
We present a lidar-based two-staged DETR inspired transformer; a smoother and tracker.<n>The smoother stage refines lidar object detections, from any off-the-shelf detector, across a moving temporal window.<n>The tracker stage uses a DETR-based attention block to maintain tracks across time by associating tracked objects with the refined detections using the point cloud as context.
arXiv Detail & Related papers (2025-05-19T06:25:48Z)
Detection Is Tracking: Point Cloud Multi-Sweep Deep Learning Models Revisited [0.0]
In autonomous driving, lidar measurements are usually passed through a "virtual sensor" realized by a deep learning model. We argue in this paper that such an input already contains temporal information, and therefore the virtual sensor output should also contain temporal information. We present a deep learning model called MULti-Sweep PAired Detector (MULSPAD) that produces, for each detected object, a pair of bounding boxes at both the end time and the beginning time of the input buffer.
arXiv Detail & Related papers (2024-02-24T08:07:48Z)
UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z)
TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses [51.60422927416087]
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. We present TrajectoryFormer, a novel point-cloud-based 3D MOT framework.
arXiv Detail & Related papers (2023-06-09T13:31:50Z)
Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations [0.6144680854063939]
Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. In the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with dense object queries, demonstrated exceptional tracking capabilities while maintaining reasonable runtime. We propose a post-processing mechanism grounded in the Track-by-Detection paradigm, aiming to refine the track displacement estimation.
arXiv Detail & Related papers (2022-10-24T19:47:58Z)
Transformers for Object Detection in Large Point Clouds [9.287964414592826]
We present TransLPC, a novel detection model for large point clouds based on a transformer architecture. We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries. This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
arXiv Detail & Related papers (2022-09-30T06:35:43Z)
Bag of Tricks for Domain Adaptive Multi-Object Tracking [4.084199842578325]
The proposed method was built from pre-existing detector and tracker under the tracking-by-detection paradigm. The tracker we used is an online tracker that merely links newly received detections with existing tracks. Our method, SIA_Track, takes the first place on MOT Synth2MOT17 track at BMTT 2022 challenge.
arXiv Detail & Related papers (2022-05-31T08:49:20Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data. We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively. To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z)
TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture. We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z)
Learning Spatio-Temporal Transformer for Visual Tracking [108.11680070733598]
We present a new tracking architecture with an encoder-decoder transformer as the key component. The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing. The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running real-time speed, being 6x faster than Siam R-CNN.
arXiv Detail & Related papers (2021-03-31T15:19:19Z)
Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers. This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)
Trajectory saliency detection using consistency-oriented latent codes from a recurrent auto-encoder [0.0]
Trajectories represent the best way to support progressive dynamic saliency detection. A trajectory will be qualified as salient if it deviates from normal trajectories that share a common motion pattern related to a given context. We show that our method outperforms existing methods on several scenarios drawn from the publicly available dataset of pedestrian trajectories acquired in a railway station.
arXiv Detail & Related papers (2020-12-17T13:29:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.