Dynamic Disentangled Fusion Network for RGBT Tracking
- URL: http://arxiv.org/abs/2412.08441v1
- Date: Wed, 11 Dec 2024 15:03:27 GMT
- Title: Dynamic Disentangled Fusion Network for RGBT Tracking
- Authors: Chenglong Li, Tao Wang, Zhaodong Ding, Yun Xiao, Jin Tang,
- Abstract summary: We propose a novel Dynamic Disentangled Fusion Network called DDFNet, which disentangles the fusion process into several dynamic fusion models.<n>In particular, we design six attribute-based fusion models to integrate RGB and thermal features under six challenging scenarios.<n> Experimental results on benchmark datasets demonstrate the effectiveness of our DDFNet against other state-of-the-art methods.
- Score: 26.99387895981277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGBT tracking usually suffers from various challenging factors of low resolution, similar appearance, extreme illumination, thermal crossover and occlusion, to name a few. Existing works often study complex fusion models to handle challenging scenarios, but can not well adapt to various challenges, which might limit tracking performance. To handle this problem, we propose a novel Dynamic Disentangled Fusion Network called DDFNet, which disentangles the fusion process into several dynamic fusion models via the challenge attributes to adapt to various challenging scenarios, for robust RGBT tracking. In particular, we design six attribute-based fusion models to integrate RGB and thermal features under the six challenging scenarios respectively.Since each fusion model is to deal with the corresponding challenges, such disentangled fusion scheme could increase the fusion capacity without the dependence on large-scale training data. Considering that every challenging scenario also has different levels of difficulty, we propose to optimize the combination of multiple fusion units to form each attribute-based fusion model in a dynamic manner, which could well adapt to the difficulty of the corresponding challenging scenario. To address the issue that which fusion models should be activated in the tracking process, we design an adaptive aggregation fusion module to integrate all features from attribute-based fusion models in an adaptive manner with a three-stage training algorithm. In addition, we design an enhancement fusion module to further strengthen the aggregated feature and modality-specific features. Experimental results on benchmark datasets demonstrate the effectiveness of our DDFNet against other state-of-the-art methods.
Related papers
- Conditional Controllable Image Fusion [56.4120974322286]
conditional controllable fusion (CCF) framework for general image fusion tasks without specific training.
CCF employs specific fusion constraints for each individual in practice.
Experiments validate our effectiveness in general fusion tasks across diverse scenarios.
arXiv Detail & Related papers (2024-11-03T13:56:15Z) - FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model.
FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z) - Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection [19.89237876061433]
Multi-modal salient object detection (MSOD) aims to boost saliency detection performance by integrating visible sources with depth or thermal infrared ones.
Existing methods generally design different fusion schemes to handle certain issues or challenges.
We propose a novel adaptive fusion bank that makes full use of the complementary benefits from a set of basic fusion schemes to handle different challenges simultaneously.
arXiv Detail & Related papers (2024-06-03T09:11:34Z) - Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion [18.138433117711177]
We propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking.
The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities.
Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with other state-of-the-art methods.
arXiv Detail & Related papers (2024-05-28T07:24:56Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - AFter: Attention-based Fusion Router for RGBT Tracking [22.449878625622844]
Existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature.
We develop a novel emphAttention-based emphFusion rouemphter called AFter, which optimize the fusion structure to adapt to challenging scenarios.
In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure.
arXiv Detail & Related papers (2024-05-04T17:24:09Z) - FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion [29.130355774088205]
FuseMoE is a mixture-of-experts framework incorporated with an innovative gating function.<n>Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories.
arXiv Detail & Related papers (2024-02-05T17:37:46Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - Deep Model Fusion: A Survey [37.39100741978586]
Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one.
It faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc.
arXiv Detail & Related papers (2023-09-27T14:40:12Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.