Related papers: From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking

From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking

URL: http://arxiv.org/abs/2512.02392v1
Date: Tue, 02 Dec 2025 04:04:39 GMT
Title: From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking
Authors: Yuqing Shao, Yuchen Yang, Rui Yu, Weilong Li, Xu Guo, Huaicheng Yan, Wei Wang, Xiao Sun,
Abstract summary: This paper introduces an explicit feature framework that enhances object discriminativeness across three perspectives.<n>Experiments demonstrate that SpongeBobA achieves state-of-the-art performance on multiple challenging MOT benchmarks, including DanceTrack, SportsMOT, and BFT.
Score: 24.903851740154433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end multi-object tracking (MOT) methods have recently achieved remarkable progress by unifying detection and association within a single framework. Despite their strong detection performance, these methods suffer from relatively low association accuracy. Through detailed analysis, we observe that object embeddings produced by the shared DETR architecture display excessively high inter-object similarity, as it emphasizes only category-level discrimination within single frames. In contrast, tracking requires instance-level distinction across frames with spatial and temporal continuity, for which current end-to-end approaches insufficiently optimize object embeddings. To address this, we introduce FDTA (From Detection to Association), an explicit feature refinement framework that enhances object discriminativeness across three complementary perspectives. Specifically, we introduce a Spatial Adapter (SA) to integrate depth-aware cues for spatial continuity, a Temporal Adapter (TA) to aggregate historical information for temporal dependencies, and an Identity Adapter (IA) to leverage quality-aware contrastive learning for instance-level separability. Extensive experiments demonstrate that FDTA achieves state-of-the-art performance on multiple challenging MOT benchmarks, including DanceTrack, SportsMOT, and BFT, highlighting the effectiveness of our proposed discriminative embedding enhancement strategy. The code is available at https://github.com/Spongebobbbbbbbb/FDTA.

Related papers

IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z)
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning [71.41062111470414]
Current object detectors excel at entity localization and classification, yet exhibit inherent limitations in event recognition capabilities.<n>We present a novel framework that expands the capability of standard object detectors beyond mere object recognition to complex event understanding.<n>Our key innovation lies in bridging the semantic gap between object detection and event understanding without requiring expensive task-specific training.
arXiv Detail & Related papers (2025-02-09T10:30:54Z)
Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking [5.746443489229576]
Key Frame Extraction (KFE) module leverages reinforcement learning to adaptively segment videos.<n> Intra-Frame Feature Fusion (IFF) module uses a Graph Convolutional Network (GCN) to facilitate information exchange between the target and surrounding objects.<n>Our proposed tracker achieves impressive results on the MOT17 dataset.
arXiv Detail & Related papers (2025-01-17T11:36:38Z)
Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.<n>Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.<n>We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z)
Spatial-Temporal Multi-level Association for Video Object Segmentation [89.32226483171047]
This paper proposes spatial-temporal multi-level association, which jointly associates reference frame, test frame, and object features. Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features.
arXiv Detail & Related papers (2024-04-09T12:44:34Z)
Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking [38.36872739816151]
Occlusion-Aware Attention (OAA) module in the detector highlights the object features while suppressing the occluded background regions. OAA can serve as a modulator that enhances the detector for some potentially occluded objects. We design a Re-ID embedding matching block based on the optimal transport problem.
arXiv Detail & Related papers (2023-08-30T06:56:53Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges. We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z)
Exploiting Domain Transferability for Collaborative Inter-level Domain Adaptive Object Detection [17.61278045720336]
Domain adaptation for object detection (DAOD) has recently drawn much attention owing to its capability of detecting target objects without any annotations. Previous works focus on aligning features extracted from partial levels in a two-stage detector via adversarial training. We introduce a novel framework for ProposalD with three proposed components: Multi-scale-aware Uncertainty Attention (MUA), Transferable Region Network (TRPN), and Dynamic Instance Sampling (DIS)
arXiv Detail & Related papers (2022-07-20T01:50:26Z)
Scalable Video Object Segmentation with Identification Mechanism [125.4229430216776]
This paper explores the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object (VOS) We present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST) Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks.
arXiv Detail & Related papers (2022-03-22T03:33:27Z)
Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association. The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z)
Multi-object Tracking with a Hierarchical Single-branch Network [31.680667324595557]
We propose an online multi-object tracking framework based on a hierarchical single-branch network. Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance. Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance.
arXiv Detail & Related papers (2021-01-06T12:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.