Head Anchor Enhanced Detection and Association for Crowded Pedestrian Tracking
- URL: http://arxiv.org/abs/2508.05514v1
- Date: Thu, 07 Aug 2025 15:47:34 GMT
- Title: Head Anchor Enhanced Detection and Association for Crowded Pedestrian Tracking
- Authors: Zewei Wu, César Teixeira, Wei Ke, Zhang Xiong,
- Abstract summary: The proposed method incorporates detection features from both the regression and classification branches of an object detector.<n>In terms of motion modeling, we propose an iterative Kalman filtering approach designed to align with modern detector assumptions.
- Score: 8.653608112604472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual pedestrian tracking represents a promising research field, with extensive applications in intelligent surveillance, behavior analysis, and human-computer interaction. However, real-world applications face significant occlusion challenges. When multiple pedestrians interact or overlap, the loss of target features severely compromises the tracker's ability to maintain stable trajectories. Traditional tracking methods, which typically rely on full-body bounding box features extracted from {Re-ID} models and linear constant-velocity motion assumptions, often struggle in severe occlusion scenarios. To address these limitations, this work proposes an enhanced tracking framework that leverages richer feature representations and a more robust motion model. Specifically, the proposed method incorporates detection features from both the regression and classification branches of an object detector, embedding spatial and positional information directly into the feature representations. To further mitigate occlusion challenges, a head keypoint detection model is introduced, as the head is less prone to occlusion compared to the full body. In terms of motion modeling, we propose an iterative Kalman filtering approach designed to align with modern detector assumptions, integrating 3D priors to better complete motion trajectories in complex scenes. By combining these advancements in appearance and motion modeling, the proposed method offers a more robust solution for multi-object tracking in crowded environments where occlusions are prevalent.
Related papers
- DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models [11.34839442803445]
We propose a multi-class collaborative detection and tracking framework tailored for diverse road users.<n>We first present a detector with a global spatial attention fusion (GSAF) module, enhancing multi-scale feature learning for objects of varying sizes.<n>Next, we introduce a tracklet RE-IDentification (REID) module that leverages visual semantics with a vision foundation model to effectively reduce ID SWitch (IDSW) errors.
arXiv Detail & Related papers (2025-06-09T02:49:10Z) - Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space.<n>We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy.<n>We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z) - 3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter [6.13623925528906]
3D Multi-Object Tracking (MOT) is essential for intelligent systems like autonomous driving and robotic sensing.
We propose a GRU-based MOT method, which introduces a learnable Kalman filter into the motion module.
This approach is able to learn object motion characteristics through data-driven learning, thereby avoiding the need for manual model design and model error.
arXiv Detail & Related papers (2024-11-13T08:34:07Z) - LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module.
LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.
Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics.
Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities.
We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z) - Iterative Scale-Up ExpansionIoU and Deep Features Association for
Multi-Object Tracking in Sports [26.33239898091364]
We propose a novel online and robust multi-object tracking approach named deep ExpansionIoU (Deep-EIoU) for sports scenarios.
Unlike conventional methods, we abandon the use of the Kalman filter and leverage the iterative scale-up ExpansionIoU and deep features for robust tracking in sports scenarios.
Our proposed method demonstrates remarkable effectiveness in tracking irregular motion objects, achieving a score of 77.2% on the SportsMOT dataset and 85.4% on the SoccerNet-Tracking dataset.
arXiv Detail & Related papers (2023-06-22T17:47:08Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector.
It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.