Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks
- URL: http://arxiv.org/abs/2312.16244v3
- Date: Wed, 20 Mar 2024 06:50:19 GMT
- Title: Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks
- Authors: Andong Lu, Jiacong Zhao, Chenglong Li, Jin Tang, Bin Luo,
- Abstract summary: Modal information might miss due to factors such as thermal sensor self-calibration and data transmission error.
We propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model.
Our method achieves significant performance improvements compared with state-of-the-art methods.
- Score: 21.139161163767884
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current RGBT tracking research relies on the complete multi-modal input, but modal information might miss due to some factors such as thermal sensor self-calibration and data transmission error, called modality-missing challenge in this work. To address this challenge, we propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model to adapt to various modality-missing scenarios, for robust RGBT tracking. Given one modality-missing scenario, we propose to utilize the available modality to generate the prompt of the missing modality to adapt to RGBT tracking model. However, the cross-modality gap between available and missing modalities usually causes semantic distortion and information loss in prompt generation. To handle this issue, we design the invertible prompter by incorporating the full reconstruction of the input available modality from the generated prompt. To provide a comprehensive evaluation platform, we construct several high-quality benchmark datasets, in which various modality-missing scenarios are considered to simulate real-world challenges. Extensive experiments on three modality-missing benchmark datasets show that our method achieves significant performance improvements compared with state-of-the-art methods. We have released the code and simulation datasets at: \href{https://github.com/Alexadlu/Modality-missing-RGBT-Tracking.git}{https://github.com/Alexadlu/Modality-missing-RGBT-Tracking.git}.
Related papers
- Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking [1.8843687952462744]
M3PT is a novel RGB-T prompt tracking method that leverages middle fusion and multi-modal and multi-stage visual prompts to overcome challenges.
Based on the meta-framework, we utilize multiple flexible prompt strategies to adapt the pre-trained model to comprehensive exploration of uni-modal patterns.
arXiv Detail & Related papers (2024-03-27T02:06:25Z) - Gradient-Guided Modality Decoupling for Missing-Modality Robustness [24.95911972867697]
We introduce a novel indicator, gradients, to monitor and reduce modality dominance.
We present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities.
In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing framework.
arXiv Detail & Related papers (2024-02-26T05:50:43Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios [23.43319138048058]
Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data.
Traditional methods have often involved discarding data or substituting data segments with zero vectors to approximate these incompletenesses.
We introduce a novel noise-robust MER model that effectively learns robust multimodal joint representations from noisy data.
arXiv Detail & Related papers (2023-09-21T10:49:02Z) - DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions [52.63323657077447]
We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking.
Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture.
We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2023-09-09T04:40:01Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Prompting for Multi-Modal Tracking [70.0522146292258]
We propose a novel multi-modal prompt tracker (ProTrack) for multi-modal tracking.
ProTrack can transfer the multi-modal inputs to a single modality by the prompt paradigm.
Our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data.
arXiv Detail & Related papers (2022-07-29T09:35:02Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Challenge-Aware RGBT Tracking [32.88141817679821]
We propose a novel challenge-aware neural network to handle the modality-shared challenges and the modality-specific ones.
We show that our method operates at a real-time speed while performing well against the state-of-the-art methods on three benchmark datasets.
arXiv Detail & Related papers (2020-07-26T15:11:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.