iKUN: Speak to Trackers without Retraining
- URL: http://arxiv.org/abs/2312.16245v2
- Date: Mon, 11 Mar 2024 07:52:22 GMT
- Title: iKUN: Speak to Trackers without Retraining
- Authors: Yunhao Du, Cheng Lei, Zhicheng Zhao, Fei Su
- Abstract summary: We propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers.
To improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise.
We also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions.
- Score: 21.555469501789577
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Referring multi-object tracking (RMOT) aims to track multiple objects based
on input textual descriptions. Previous works realize it by simply integrating
an extra textual module into the multi-object tracker. However, they typically
need to retrain the entire framework and have difficulties in optimization. In
this work, we propose an insertable Knowledge Unification Network, termed iKUN,
to enable communication with off-the-shelf trackers in a plug-and-play manner.
Concretely, a knowledge unification module (KUM) is designed to adaptively
extract visual features based on textual guidance. Meanwhile, to improve the
localization accuracy, we present a neural version of Kalman filter (NKF) to
dynamically adjust process noise and observation noise based on the current
motion status. Moreover, to address the problem of open-set long-tail
distribution of textual descriptions, a test-time similarity calibration method
is proposed to refine the confidence score with pseudo frequency. Extensive
experiments on Refer-KITTI dataset verify the effectiveness of our framework.
Finally, to speed up the development of RMOT, we also contribute a more
challenging dataset, Refer-Dance, by extending public DanceTrack dataset with
motion and dressing descriptions. The codes and dataset are available at
https://github.com/dyhBUPT/iKUN.
Related papers
- SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking [89.43370214059955]
Open-vocabulary Multiple Object Tracking (MOT) aims to generalize trackers to novel categories not in the training set.
We present a unified framework that jointly considers semantics, location, and appearance priors in the early steps of association.
Our method eliminates complex post-processings for fusing different cues and boosts the association performance significantly for large-scale open-vocabulary tracking.
arXiv Detail & Related papers (2024-09-17T14:36:58Z) - Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval [4.454835029368504]
We focus on the recently introduced text-motion retrieval which aim to search for sequences that are most relevant to a natural motion description.
Despite recent efforts to explore these promising avenues, a primary challenge remains the insufficient data available to train robust text-motion models.
We propose to investigate joint-dataset learning - where we train on multiple text-motion datasets simultaneously.
We also introduce a transformer-based motion encoder, called MoT++, which employs the specified-temporal attention to process sequences of skeleton data.
arXiv Detail & Related papers (2024-07-02T09:43:47Z) - Engineering an Efficient Object Tracker for Non-Linear Motion [0.0]
The goal of multi-object tracking is to detect and track all objects in a scene while maintaining unique identifiers for each.
This task is especially hard in case of scenarios involving dynamic and non-linear motion patterns.
In this paper, we introduce DeepMoveSORT, a novel, carefully engineered multi-object tracker designed specifically for such scenarios.
arXiv Detail & Related papers (2024-06-30T15:50:54Z) - Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking.
Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z) - LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module.
LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.
Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - A Bayesian Detect to Track System for Robust Visual Object Tracking and
Semi-Supervised Model Learning [1.7268829007643391]
We ad-dress problems in a Bayesian tracking and detection framework parameterized by neural network outputs.
We propose a particle filter-based approximate sampling algorithm for tracking object state estimation.
Based on our particle filter inference algorithm, a semi-supervised learn-ing algorithm is utilized for learning tracking network on intermittent labeled frames.
arXiv Detail & Related papers (2022-05-05T00:18:57Z) - Context-aware Visual Tracking with Joint Meta-updating [11.226947525556813]
We propose a context-aware tracking model to optimize the tracker over the representation space, which jointly meta-update both branches by exploiting information along the whole sequence.
The proposed tracking method achieves an EAO score of 0.514 on VOT2018 with the speed of 40FPS, demonstrating its capability of improving the accuracy and robustness of the underlying tracker with little speed drop.
arXiv Detail & Related papers (2022-04-04T14:16:00Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.