Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
- URL: http://arxiv.org/abs/2211.03055v1
- Date: Sun, 6 Nov 2022 07:59:07 GMT
- Title: Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
- Authors: Shang Gao and Jinyu Yang and Zhe Li and Feng Zheng and Ale\v{s}
Leonardis and Jingkuan Song
- Abstract summary: Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
- Score: 67.14537242378988
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the development of depth sensors in recent years, RGBD object tracking
has received significant attention. Compared with the traditional RGB object
tracking, the addition of the depth modality can effectively solve the target
and background interference. However, some existing RGBD trackers use the two
modalities separately and thus some particularly useful shared information
between them is ignored. On the other hand, some methods attempt to fuse the
two modalities by treating them equally, resulting in the missing of
modality-specific features. To tackle these limitations, we propose a novel
Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn
informative and discriminative representations of the target objects for robust
RGBD tracking. The first fusion module focuses on extracting the shared
information between modalities based on cross-modal attention. The second aims
at integrating the RGB-specific and depth-specific information to enhance the
fused features. By fusing both the modality-shared and modality-specific
information in a modality-aware scheme, our DMTracker can learn discriminative
representations in complex tracking scenes. Experiments show that our proposed
tracker achieves very promising results on challenging RGBD benchmarks. Code is
available at \url{https://github.com/ShangGaoG/DMTracker}.
Related papers
- TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking [30.89375068036783]
Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models.
We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data.
Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
arXiv Detail & Related papers (2024-05-08T12:19:08Z) - RGB-T Tracking Based on Mixed Attention [5.151994214135177]
RGB-T tracking involves the use of images from both visible and thermal modalities.
An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
arXiv Detail & Related papers (2023-04-09T15:59:41Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - RGBD Object Tracking: An In-depth Review [89.96221353160831]
We firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking framework.
We benchmark a representative set of RGBD trackers, and give detailed analyses based on their performances.
arXiv Detail & Related papers (2022-03-26T18:53:51Z) - Visual Object Tracking on Multi-modal RGB-D Videos: A Review [16.098468526632473]
The goal of this review is to summarize the relative knowledge of the research filed of RGB-D tracking.
To be specific, we will generalize the related RGB-D tracking benchmarking datasets as well as the corresponding performance measurements.
arXiv Detail & Related papers (2022-01-23T08:02:49Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.