Depth Attention for Robust RGB Tracking
- URL: http://arxiv.org/abs/2410.20395v1
- Date: Sun, 27 Oct 2024 09:47:47 GMT
- Title: Depth Attention for Robust RGB Tracking
- Authors: Yu Liu, Arif Mahmood, Muhammad Haris Khan,
- Abstract summary: We propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out of view or affected by motion blur in RGB video sequences.
To the best of our knowledge, we are the first to propose a depth attention mechanism and to formulate a simple framework that allows seamlessly integration of depth information with state of the art tracking algorithms.
- Score: 21.897255266278275
- License:
- Abstract: RGB video object tracking is a fundamental task in computer vision. Its effectiveness can be improved using depth information, particularly for handling motion-blurred target. However, depth information is often missing in commonly used tracking benchmarks. In this work, we propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out of view or affected by motion blur in RGB video sequences. Specifically, our work introduces following contributions. To the best of our knowledge, we are the first to propose a depth attention mechanism and to formulate a simple framework that allows seamlessly integration of depth information with state of the art tracking algorithms, without RGB-D cameras, elevating accuracy and robustness. We provide extensive experiments on six challenging tracking benchmarks. Our results demonstrate that our approach provides consistent gains over several strong baselines and achieves new SOTA performance. We believe that our method will open up new possibilities for more sophisticated VOT solutions in real-world scenarios. Our code and models are publicly released: https://github.com/LiuYuML/Depth-Attention.
Related papers
- Tracking Reflected Objects: A Benchmark [12.770787846444406]
We introduce TRO, a benchmark specifically for Tracking Reflected Objects.
TRO includes 200 sequences with around 70,000 frames, each carefully annotated with bounding boxes.
To provide a stronger baseline, we propose a new tracker, HiP-HaTrack, which uses hierarchical features to improve performance.
arXiv Detail & Related papers (2024-07-07T02:22:45Z) - Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning [15.364238035194854]
We propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes.
The results show an improved tracking accuracy even without real depth.
arXiv Detail & Related papers (2024-05-23T05:43:38Z) - Improving Underwater Visual Tracking With a Large Scale Dataset and
Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT)
It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles.
We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality.
The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - RGBD Object Tracking: An In-depth Review [89.96221353160831]
We firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking framework.
We benchmark a representative set of RGBD trackers, and give detailed analyses based on their performances.
arXiv Detail & Related papers (2022-03-26T18:53:51Z) - Cross-Modal Object Tracking: Modality-Aware Representations and A
Unified Benchmark [8.932487291107812]
In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions.
We propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process.
We will release the dataset for free academic usage, dataset download link and code will be released soon.
arXiv Detail & Related papers (2021-11-08T03:58:55Z) - RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z) - Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.