Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking
- URL: http://arxiv.org/abs/2506.00774v1
- Date: Sun, 01 Jun 2025 01:44:56 GMT
- Title: Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking
- Authors: Milad Khanchi, Maria Amer, Charalambos Poullis,
- Abstract summary: Current motion-based multiple object tracking approaches rely heavily on Intersection-over-Union (IoU) for object association.<n>We estimate depth using a zero-shot approach and incorporate it as an independent feature in the association process.<n>We introduce a Hierarchical Alignment Score that refines IoU by integrating both coarse bounding box overlap and fine-grained (pixel-level) alignment.
- Score: 2.4578723416255754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current motion-based multiple object tracking (MOT) approaches rely heavily on Intersection-over-Union (IoU) for object association. Without using 3D features, they are ineffective in scenarios with occlusions or visually similar objects. To address this, our paper presents a novel depth-aware framework for MOT. We estimate depth using a zero-shot approach and incorporate it as an independent feature in the association process. Additionally, we introduce a Hierarchical Alignment Score that refines IoU by integrating both coarse bounding box overlap and fine-grained (pixel-level) alignment to improve association accuracy without requiring additional learnable parameters. To our knowledge, this is the first MOT framework to incorporate 3D features (monocular depth) as an independent decision matrix in the association step. Our framework achieves state-of-the-art results on challenging benchmarks without any training nor fine-tuning. The code is available at https://github.com/Milad-Khanchi/DepthMOT
Related papers
- PD-SORT: Occlusion-Robust Multi-Object Tracking Using Pseudo-Depth Cues [8.642829333393442]
Multi-object tracking (MOT) is a rising topic in video processing technologies and has important application value in consumer electronics.<n>Currently, tracking-by-detection (TBD) is the dominant paradigm for MOT, which performs target detection and association frame by frame.<n>We propose Pseudo-Depth SORT (PD-SORT) to enhance the association performance and achieves leading performances on DanceTrack, MOT17, and MOT20.
arXiv Detail & Related papers (2025-01-20T05:50:39Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - SparseTrack: Multi-Object Tracking by Performing Scene Decomposition
based on Pseudo-Depth [84.64121608109087]
We propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images.
Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets.
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
arXiv Detail & Related papers (2023-06-08T14:36:10Z) - OBMO: One Bounding Box Multiple Objects for Monocular 3D Object
Detection [24.9579490539696]
monocular 3D object detection has attracted much attention due to its simple configuration.
In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity.
We propose a plug-and-play module, underlineOne underlineBounding Box underlineMultiple underlineObjects (OBMO)
arXiv Detail & Related papers (2022-12-20T07:46:49Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Depth Perspective-aware Multiple Object Tracking [24.06104433665443]
DP-MOT is a real-time Depth Perspective-aware Multiple Object Tracking approach.
The proposed approach consistently achieves state-of-the-art performance compared to recent MOT methods.
arXiv Detail & Related papers (2022-07-10T22:12:00Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking
from View Aggregation [8.854112907350624]
3D multi-object tracking plays a vital role in autonomous navigation.
Many approaches detect objects in 2D RGB sequences for tracking, which is lack of reliability when localizing objects in 3D space.
We propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames.
arXiv Detail & Related papers (2020-11-25T16:14:40Z) - GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with
Multi-Feature Learning [30.72094639797806]
3D Multi-object tracking (MOT) is crucial to autonomous systems.
We propose two techniques to improve the discriminative feature learning for MOT.
Our proposed method achieves state-of-the-art performance on KITTI and nuScenes 3D MOT benchmarks.
arXiv Detail & Related papers (2020-06-12T17:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.