DepthTrack : Unveiling the Power of RGBD Tracking
- URL: http://arxiv.org/abs/2108.13962v1
- Date: Tue, 31 Aug 2021 16:42:38 GMT
- Title: DepthTrack : Unveiling the Power of RGBD Tracking
- Authors: Song Yan, Jinyu Yang, Jani K\"apyl\"a, Feng Zheng, Ale\v{s} Leonardis,
Joni-Kristian K\"am\"ar\"ainen
- Abstract summary: This work introduces a new RGBD tracking dataset - Depth-Track.
It has twice as many sequences (200) and scene types (40) than in the largest existing dataset.
The average length of the sequences (1473), the number of deformable objects (16) and the number of tracking attributes (15) have been increased.
- Score: 29.457114656913944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensors
have become popular in many application fields such as robotics.However, the
best RGBD trackers are extensions of the state-of-the-art deep RGB trackers.
They are trained with RGB data and the depth channel is used as a sidekick for
subtleties such as occlusion detection. This can be explained by the fact that
there are no sufficiently large RGBD datasets to 1) train deep depth trackers
and to 2) challenge RGB trackers with sequences for which the depth cue is
essential. This work introduces a new RGBD tracking dataset - Depth-Track -
that has twice as many sequences (200) and scene types (40) than in the largest
existing dataset, and three times more objects (90). In addition, the average
length of the sequences (1473), the number of deformable objects (16) and the
number of annotated tracking attributes (15) have been increased. Furthermore,
by running the SotA RGB and RGBD trackers on DepthTrack, we propose a new RGBD
tracking baseline, namely DeT, which reveals that deep RGBD tracking indeed
benefits from genuine training data. The code and dataset is available at
https://github.com/xiaozai/DeT
Related papers
- DFormer: Rethinking RGBD Representation Learning for Semantic
Segmentation [76.81628995237058]
DFormer is a novel framework to learn transferable representations for RGB-D segmentation tasks.
It pretrains the backbone using image-depth pairs from ImageNet-1K.
DFormer achieves new state-of-the-art performance on two popular RGB-D tasks.
arXiv Detail & Related papers (2023-09-18T11:09:11Z) - ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data [75.73063721067608]
We propose a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad.
ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total.
In-depth empirical analysis has verified that the ARKitTrack dataset can significantly facilitate RGB-D tracking and that the proposed baseline method compares favorably against the state of the arts.
arXiv Detail & Related papers (2023-03-24T09:51:13Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking [30.448658049744775]
Given a limited amount of annotated RGB-D tracking data, most state-of-the-art RGB-D trackers are simple extensions of high-performance RGB-only trackers.
To address the dataset deficiency issue, a new RGB-D dataset named RGBD1K is released in this paper.
arXiv Detail & Related papers (2022-08-21T03:07:36Z) - RGBD Object Tracking: An In-depth Review [89.96221353160831]
We firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking framework.
We benchmark a representative set of RGBD trackers, and give detailed analyses based on their performances.
arXiv Detail & Related papers (2022-03-26T18:53:51Z) - Visual Object Tracking on Multi-modal RGB-D Videos: A Review [16.098468526632473]
The goal of this review is to summarize the relative knowledge of the research filed of RGB-D tracking.
To be specific, we will generalize the related RGB-D tracking benchmarking datasets as well as the corresponding performance measurements.
arXiv Detail & Related papers (2022-01-23T08:02:49Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Depth-only Object Tracking [23.27677106839962]
We study how far D-only tracking can go if trained with large amounts of depth data.
We train a "Depth-DiMP" from the scratch with the generated data and fine-tune it with the available small RGBD tracking datasets.
The depth-only DiMP achieves good accuracy in depth-only tracking and combined with the original RGB DiMP the end-to-end trained RGBD-DiMP outperforms the recent VOT 2020 RGBD winners.
arXiv Detail & Related papers (2021-10-22T09:59:31Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.