Related papers: iKUN: Speak to Trackers without Retraining

iKUN: Speak to Trackers without Retraining

URL: http://arxiv.org/abs/2312.16245v2
Date: Mon, 11 Mar 2024 07:52:22 GMT
Title: iKUN: Speak to Trackers without Retraining
Authors: Yunhao Du, Cheng Lei, Zhicheng Zhao, Fei Su
Abstract summary: We propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers. To improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise. We also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions.
Score: 21.555469501789577
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Referring multi-object tracking (RMOT) aims to track multiple objects based on input textual descriptions. Previous works realize it by simply integrating an extra textual module into the multi-object tracker. However, they typically need to retrain the entire framework and have difficulties in optimization. In this work, we propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers in a plug-and-play manner. Concretely, a knowledge unification module (KUM) is designed to adaptively extract visual features based on textual guidance. Meanwhile, to improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise and observation noise based on the current motion status. Moreover, to address the problem of open-set long-tail distribution of textual descriptions, a test-time similarity calibration method is proposed to refine the confidence score with pseudo frequency. Extensive experiments on Refer-KITTI dataset verify the effectiveness of our framework. Finally, to speed up the development of RMOT, we also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions. The codes and dataset are available at https://github.com/dyhBUPT/iKUN.

Related papers

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection [57.649223695021114]
We present a novel weakly supervised framework that leverages audio-visual collaboration for robust video anomaly detection. Our framework demonstrates superior performance across multiple benchmarks, with audio integration significantly boosting anomaly detection accuracy.
arXiv Detail & Related papers (2025-04-06T13:59:16Z)
TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking [13.004539088540188]
Existing approaches integrate query denoising within the tracking-by-attention paradigm. We propose TQD-Track, which introduces Temporal Query Denoising (TQD) tailored for MOT. We analyze our proposed TQD for different tracking paradigms, and find out the paradigm with explicit learned data association module.
arXiv Detail & Related papers (2025-04-04T08:18:48Z)
Less is More: Token Context-aware Learning for Object Tracking [20.222950380244377]
LMTrack is a token context-aware tracking pipeline. It automatically learns high-quality reference tokens for efficient visual tracking. It achieves state-of-the-art results on tracking benchmarks such as GOT-10K, TrackingNet, and LaSOT.
arXiv Detail & Related papers (2025-01-01T07:05:31Z)
SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking [89.43370214059955]
Open-vocabulary Multiple Object Tracking (MOT) aims to generalize trackers to novel categories not in the training set. We present a unified framework that jointly considers semantics, location, and appearance priors in the early steps of association. Our method eliminates complex post-processings for fusing different cues and boosts the association performance significantly for large-scale open-vocabulary tracking.
arXiv Detail & Related papers (2024-09-17T14:36:58Z)
Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks. Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z)
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval [4.454835029368504]
We focus on the recently introduced text-motion retrieval which aim to search for sequences that are most relevant to a natural motion description. Despite recent efforts to explore these promising avenues, a primary challenge remains the insufficient data available to train robust text-motion models. We propose to investigate joint-dataset learning - where we train on multiple text-motion datasets simultaneously. We also introduce a transformer-based motion encoder, called MoT++, which employs the specified-temporal attention to process sequences of skeleton data.
arXiv Detail & Related papers (2024-07-02T09:43:47Z)
Engineering an Efficient Object Tracker for Non-Linear Motion [0.0]
The goal of multi-object tracking is to detect and track all objects in a scene while maintaining unique identifiers for each. This task is especially hard in case of scenarios involving dynamic and non-linear motion patterns. In this paper, we introduce DeepMoveSORT, a novel, carefully engineered multi-object tracker designed specifically for such scenarios.
arXiv Detail & Related papers (2024-06-30T15:50:54Z)
Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking. Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z)
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module. LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation. Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z)
A Bayesian Detect to Track System for Robust Visual Object Tracking and Semi-Supervised Model Learning [1.7268829007643391]
We ad-dress problems in a Bayesian tracking and detection framework parameterized by neural network outputs. We propose a particle filter-based approximate sampling algorithm for tracking object state estimation. Based on our particle filter inference algorithm, a semi-supervised learn-ing algorithm is utilized for learning tracking network on intermittent labeled frames.
arXiv Detail & Related papers (2022-05-05T00:18:57Z)
Context-aware Visual Tracking with Joint Meta-updating [11.226947525556813]
We propose a context-aware tracking model to optimize the tracker over the representation space, which jointly meta-update both branches by exploiting information along the whole sequence. The proposed tracking method achieves an EAO score of 0.514 on VOT2018 with the speed of 40FPS, demonstrating its capability of improving the accuracy and robustness of the underlying tracker with little speed drop.
arXiv Detail & Related papers (2022-04-04T14:16:00Z)
Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method. Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z)
MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data. We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively. To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.