TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking
- URL: http://arxiv.org/abs/2405.05004v1
- Date: Wed, 8 May 2024 12:19:08 GMT
- Title: TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking
- Authors: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler,
- Abstract summary: Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models.
We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data.
Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
- Score: 30.89375068036783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intrinsic characteristics of the event data. To address this problem, we propose an Event backbone (Pooler), designed to obtain a high-quality feature representation that is cognisant of the innate characteristics of the event data, namely its sparsity. In particular, Multi-Scale Pooling is introduced to capture all the motion feature trends within event data through the utilisation of diverse pooling kernel sizes. The association between the derived RGB and event representations is established by an innovative module performing adaptive Mutually Guided Fusion (MGF). Extensive experimental results show that our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets, including VisEvent and COESOT, where the precision and success rates on COESOT are improved by 4.9% and 5.2%, respectively. Our code will be available at https://github.com/SSSpc333/TENet.
Related papers
- RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker [4.235252053339947]
This paper introduces a new challenging RGB-Sonar (RGB-S) tracking task.
It investigates how to achieve efficient tracking of an underwater target through the interaction of RGB and sonar modalities.
arXiv Detail & Related papers (2024-06-11T12:01:11Z) - Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline [37.06330707742272]
We first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT.
It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date.
We propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data.
arXiv Detail & Related papers (2024-03-09T08:49:50Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - RGB-T Tracking Based on Mixed Attention [5.151994214135177]
RGB-T tracking involves the use of images from both visible and thermal modalities.
An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
arXiv Detail & Related papers (2023-04-09T15:59:41Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Multi-domain Collaborative Feature Representation for Robust Visual
Object Tracking [32.760681454334765]
This paper focuses on effectively representing and utilizing complementary features from the frame domain and event domain.
For learning the unique features of the two domains, we utilize a Unique Extractor for Event (UEE) based on Spiking Neural Networks.
Experiments on standard RGB benchmark and real event tracking dataset demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-08-10T09:01:42Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.