Prototype-based Cross-Modal Object Tracking
- URL: http://arxiv.org/abs/2312.14471v1
- Date: Fri, 22 Dec 2023 06:49:44 GMT
- Title: Prototype-based Cross-Modal Object Tracking
- Authors: Lei Liu, Chenglong Li, Futian Wang, Longfeng Shen, and Jin Tang
- Abstract summary: Cross-modal object tracking is an important research topic in the field of information fusion.
We propose a prototype-based cross-modal object tracker called ProtoTrack, which introduces a novel prototype learning scheme to adapt to significant target appearance variations.
In particular, we design a multi-modal prototype to represent target information by multi-kind samples, including a fixed sample from the first frame and two representative samples from different modalities.
- Score: 17.367890389752596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal object tracking is an important research topic in the field of
information fusion, and it aims to address imaging limitations in challenging
scenarios by integrating switchable visible and near-infrared modalities.
However, existing tracking methods face some difficulties in adapting to
significant target appearance variations in the presence of modality switch.
For instance, model update based tracking methods struggle to maintain stable
tracking results during modality switching, leading to error accumulation and
model drift. Template based tracking methods solely rely on the template
information from first frame and/or last frame, which lacks sufficient
representation ability and poses challenges in handling significant target
appearance changes. To address this problem, we propose a prototype-based
cross-modal object tracker called ProtoTrack, which introduces a novel
prototype learning scheme to adapt to significant target appearance variations,
for cross-modal object tracking. In particular, we design a multi-modal
prototype to represent target information by multi-kind samples, including a
fixed sample from the first frame and two representative samples from different
modalities. Moreover, we develop a prototype generation algorithm based on two
new modules to ensure the prototype representative in different
challenges......
Related papers
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - A Novel Bounding Box Regression Method for Single Object Tracking [0.0]
We introduce two novel bounding box regression networks: inception and deformable.
Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks.
arXiv Detail & Related papers (2024-05-16T21:09:45Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - One for All: Toward Unified Foundation Models for Earth Vision [24.358013737755822]
Current remote sensing foundation models specialize in a single modality or a specific spatial resolution range.
We introduce OFA-Net: employing a single, shared Transformer backbone for multiple data modalities with different spatial resolutions.
The proposed method is evaluated on 12 distinct downstream tasks and demonstrates promising performance.
arXiv Detail & Related papers (2024-01-15T08:12:51Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector.
It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.