Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality
Collaboration
- URL: http://arxiv.org/abs/2105.14683v1
- Date: Mon, 31 May 2021 03:16:38 GMT
- Title: Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality
Collaboration
- Authors: Yuhang He, Wentao Yu, Jie Han, Xing Wei, Xiaopeng Hong, Yihong Gong
- Abstract summary: We propose a MultiModality PAnoramic multi-object Tracking framework (MMPAT)
It takes both 2D panorama images and 3D point clouds as input and then infers target trajectories using the multimodality data.
We evaluate the proposed method on the JRDB dataset, where the MMPAT achieves the top performance in both the detection and tracking tasks.
- Score: 56.01625477187448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the multi-object tracking (MOT) problem of
automatic driving and robot navigation. Most existing MOT methods track
multiple objects using a singular RGB camera, which are prone to camera
field-of-view and suffer tracking failures in complex scenarios due to
background clutters and poor light conditions. To meet these challenges, we
propose a MultiModality PAnoramic multi-object Tracking framework (MMPAT),
which takes both 2D panorama images and 3D point clouds as input and then
infers target trajectories using the multimodality data. The proposed method
contains four major modules, a panorama image detection module, a multimodality
data fusion module, a data association module and a trajectory inference model.
We evaluate the proposed method on the JRDB dataset, where the MMPAT achieves
the top performance in both the detection and tracking tasks and significantly
outperforms state-of-the-art methods by a large margin (15.7 and 8.5
improvement in terms of AP and MOTA, respectively).
Related papers
- 3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving [9.145911310294426]
We introduce an improved approach that integrates an additional sensor, such as LiDAR, into the MS-GLMB framework for 3D multi-object tracking.
Our experimental results demonstrate a significant improvement in tracking performance compared to existing MS-GLMB-based methods.
arXiv Detail & Related papers (2024-10-19T04:59:47Z) - MCTR: Multi Camera Tracking Transformer [45.66952089591361]
Multi-Camera Tracking tRansformer (MCTR) is a novel end-to-end approach tailored for multi-object detection and tracking across multiple cameras.
MCTR leverages end-to-end detectors like DEtector TRansformer (DETR) to produce detections and detection embeddings independently for each camera view.
The framework maintains set of track embeddings that encaplusate global information about the tracked objects, and updates them at every frame by integrating local information from the view-specific detection embeddings.
arXiv Detail & Related papers (2024-08-23T17:37:03Z) - GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking [13.305411087116635]
We propose a global online MTMC tracking model that addresses the dependency on the first tracking stage in two-step methods and enhances cross-camera matching.
Specifically, we propose a transformer-based global MTMC association module to explore target associations across different cameras and frames.
To accommodate high scene diversity and complex lighting condition variations, we have established the VisionTrack dataset.
arXiv Detail & Related papers (2024-07-01T06:39:14Z) - MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark [63.878793340338035]
Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras.
Existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting.
We present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments.
arXiv Detail & Related papers (2024-03-29T15:08:37Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - MMRDN: Consistent Representation for Multi-View Manipulation
Relationship Detection in Object-Stacked Scenes [62.20046129613934]
We propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN)
We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions.
We select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects.
arXiv Detail & Related papers (2023-04-25T05:55:29Z) - PTA-Det: Point Transformer Associating Point cloud and Image for 3D
Object Detection [3.691671505269693]
Most multi-modal detection methods perform even worse than LiDAR-only methods.
A Pseudo Point Cloud Generation Network is proposed to convert image information by pseudo points.
The features of LiDAR points and pseudo points from image can be deeply fused under a unified point-based representation.
arXiv Detail & Related papers (2023-01-18T04:35:49Z) - CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object
Tracking with Camera-LiDAR Fusion [34.42289908350286]
3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection.
It can be challenging to accurately track the irregular motion of objects for LiDAR-based methods.
We propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT)
arXiv Detail & Related papers (2022-09-06T14:41:38Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.