One Homography is All You Need: IMM-based Joint Homography and Multiple Object State Estimation
- URL: http://arxiv.org/abs/2409.02562v2
- Date: Thu, 14 Nov 2024 10:45:32 GMT
- Title: One Homography is All You Need: IMM-based Joint Homography and Multiple Object State Estimation
- Authors: Paul Johannes Claasen, Johan Pieter de Villiers,
- Abstract summary: IMM Joint Homography State Estimation (IMM-JHSE) is proposed.
IMM-JHSE uses an initial homography estimate as the only additional 3D information.
IMM-JHSE offers competitive performance on the MOT17, MOT20 and KITTI-car datasets.
- Score: 2.09942566943801
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A novel online MOT algorithm, IMM Joint Homography State Estimation (IMM-JHSE), is proposed. IMM-JHSE uses an initial homography estimate as the only additional 3D information, whereas other 3D MOT methods use regular 3D measurements. By jointly modelling the homography matrix and its dynamics as part of track state vectors, IMM-JHSE removes the explicit influence of camera motion compensation techniques on predicted track position states, which was prevalent in previous approaches. Expanding upon this, static and dynamic camera motion models are combined using an IMM filter. A simple bounding box motion model is used to predict bounding box positions to incorporate image plane information. In addition to applying an IMM to camera motion, a non-standard IMM approach is applied where bounding-box-based BIoU scores are mixed with ground-plane-based Mahalanobis distances in an IMM-like fashion to perform association only, making IMM-JHSE robust to motion away from the ground plane. Finally, IMM-JHSE makes use of dynamic process and measurement noise estimation techniques. IMM-JHSE improves upon related techniques, including UCMCTrack, OC-SORT, C-BIoU and ByteTrack on the DanceTrack and KITTI-car datasets, increasing HOTA by 2.64 and 2.11, respectively, while offering competitive performance on the MOT17, MOT20 and KITTI-pedestrian datasets. Using publicly available detections, IMM-JHSE outperforms almost all other 2D MOT methods and is outperformed only by 3D MOT methods -- some of which are offline -- on the KITTI-car dataset. Compared to tracking-by-attention methods, IMM-JHSE shows remarkably similar performance on the DanceTrack dataset and outperforms them on the MOT17 dataset. The code is publicly available: \url{https://github.com/Paulkie99/imm-jhse}.
Related papers
- Tracking Meets Large Multimodal Models for Driving Scenario Understanding [76.71815464110153]
Large Multimodal Models (LMMs) have recently gained prominence in autonomous driving research.
We propose to integrate tracking information as an additional input to recover 3D spatial and temporal details.
We introduce a novel approach for embedding this tracking information into LMMs to enhance their understanding of driving scenarios.
arXiv Detail & Related papers (2025-03-18T17:59:12Z) - ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera [53.20087549782785]
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera.
Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
arXiv Detail & Related papers (2024-10-14T19:14:49Z) - Data-Driven Approaches for Modelling Target Behaviour [1.5495593104596401]
The performance of tracking algorithms depends on the chosen model assumptions regarding the target dynamics.
This paper provides a comparative study between three different methods that use machine learning to describe the underlying object motion.
arXiv Detail & Related papers (2024-10-14T14:18:27Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object
Tracking with Camera-LiDAR Fusion [34.42289908350286]
3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection.
It can be challenging to accurately track the irregular motion of objects for LiDAR-based methods.
We propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT)
arXiv Detail & Related papers (2022-09-06T14:41:38Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - A two-stage data association approach for 3D Multi-object Tracking [0.0]
We adapt a two-stage dataassociation method which was successful in image-based tracking to the 3D setting.
Our method outperforms the baseline using one-stagebipartie matching for data association by achieving 0.587 AMOTA in NuScenes validation set.
arXiv Detail & Related papers (2021-01-21T15:50:17Z) - FlowMOT: 3D Multi-Object Tracking by Scene Flow Association [9.480272707157747]
We propose a LiDAR-based 3D MOT framework named FlowMOT, which integrates point-wise motion information with the traditional matching algorithm.
Our approach outperforms recent end-to-end methods and achieves competitive performance with the state-of-the-art filter-based method.
arXiv Detail & Related papers (2020-12-14T14:03:48Z) - Online Multi-Object Tracking and Segmentation with GMPHD Filter and
Mask-based Affinity Fusion [79.87371506464454]
We propose a fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input.
The proposed method is based on the Gaussian mixture probability hypothesis density (GMPHD) filter, a hierarchical data association (HDA), and a mask-based affinity fusion (MAF) model.
In the experiments on the two popular MOTS datasets, the key modules show some improvements.
arXiv Detail & Related papers (2020-08-31T21:06:22Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z) - Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision.
We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes.
With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z) - Probabilistic 3D Multi-Object Tracking for Autonomous Driving [23.036619327925088]
We present our on-line tracking method, which made the first place in the NuScenes Tracking Challenge.
Our method estimates the object states by adopting a Kalman Filter.
Our experimental results on the NuScenes validation and test set show that our method outperforms the AB3DMOT baseline method.
arXiv Detail & Related papers (2020-01-16T06:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.