Exploring Fusion Strategies for Accurate RGBT Visual Object Tracking
- URL: http://arxiv.org/abs/2201.08673v1
- Date: Fri, 21 Jan 2022 12:37:43 GMT
- Title: Exploring Fusion Strategies for Accurate RGBT Visual Object Tracking
- Authors: Zhangyong Tang (1), Tianyang Xu (1), Hui Li (1), Xiao-Jun Wu (1),
Xuefeng Zhu (1) and Josef Kittler (2) ((1) Jiangnan University, Wuxi, China,
(2) University of Surrey, UK)
- Abstract summary: We address the problem of multi-modal object tracking in video.
We explore various options of fusing the complementary information conveyed by the visible (RGB) and thermal infrared (TIR) modalities.
- Score: 1.015785232738621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of multi-modal object tracking in video and explore
various options of fusing the complementary information conveyed by the visible
(RGB) and thermal infrared (TIR) modalities including pixel-level,
feature-level and decision-level fusion. Specifically, different from the
existing methods, paradigm of image fusion task is heeded for fusion at pixel
level. Feature-level fusion is fulfilled by attention mechanism with channels
excited optionally. Besides, at decision level, a novel fusion strategy is put
forward since an effortless averaging configuration has shown the superiority.
The effectiveness of the proposed decision-level fusion strategy owes to a
number of innovative contributions, including a dynamic weighting of the RGB
and TIR contributions and a linear template update operation. A variant of
which produced the winning tracker at the Visual Object Tracking Challenge 2020
(VOT-RGBT2020). The concurrent exploration of innovative pixel- and
feature-level fusion strategies highlights the advantages of the proposed
decision-level fusion method. Extensive experimental results on three
challenging datasets, \textit{i.e.}, GTOT, VOT-RGBT2019, and VOT-RGBT2020,
demonstrate the effectiveness and robustness of the proposed method, compared
to the state-of-the-art approaches. Code will be shared at
\textcolor{blue}{\emph{https://github.com/Zhangyong-Tang/DFAT}.
Related papers
- FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement [42.13579140792305]
FusionNet is a novel multi-model linear fusion framework that operates in parallel to capture global and local features across diverse color spaces.
By incorporating a linear fusion strategy underpinned by Hilbert space theoretical guarantees, FusionNet mitigates network collapse and reduces excessive training costs.
Our method achieved 1st place in the CVPR2025 NTIRE Low Light Enhancement Challenge.
arXiv Detail & Related papers (2025-04-27T16:22:03Z) - Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking [21.18680957184296]
Current RGBT tracking methods often overlook the impact of fusion location on mitigating modality gap.
We propose a novel textbfTask-driven textbfPixel-level textbfFusion network, named textbfTPF.
In particular, we design a lightweight Pixel-level Fusion Adapter (PFA) that exploits Mamba's linear complexity to ensure real-time, low-latency RGBT tracking.
arXiv Detail & Related papers (2025-03-14T09:56:13Z) - Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation [7.757018983487103]
We propose a novel multimodal fusion network (EFNet) based on an early fusion strategy and a simple but effective feature clustering for training efficient RGB-T semantic segmentation.
We validate the effectiveness of our method on different datasets and outperform previous state-of-the-art methods with lower parameters and computation.
arXiv Detail & Related papers (2025-01-19T06:16:45Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - ICAFusion: Iterative Cross-Attention Guided Feature Fusion for
Multispectral Object Detection [25.66305300362193]
A novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction.
This framework enhances the discriminability of object features through the query-guided cross-attention mechanism.
The proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.
arXiv Detail & Related papers (2023-08-15T00:02:10Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference.
This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion.
The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z) - Breaking Free from Fusion Rule: A Fully Semantic-driven Infrared and
Visible Image Fusion [51.22863068854784]
Infrared and visible image fusion plays a vital role in the field of computer vision.
Previous approaches make efforts to design various fusion rules in the loss functions.
We develop a semantic-level fusion network to sufficiently utilize the semantic guidance.
arXiv Detail & Related papers (2022-11-22T13:59:59Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.