Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning
- URL: http://arxiv.org/abs/2405.14195v1
- Date: Thu, 23 May 2024 05:43:38 GMT
- Title: Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning
- Authors: Zhenyu Wei, Yujie He, Zhanchuan Cai,
- Abstract summary: We propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes.
The results show an improved tracking accuracy even without real depth.
- Score: 15.364238035194854
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth. Through these findings we highlight the potential of depth estimation in enhancing object tracking performance.
Related papers
- Depth Attention for Robust RGB Tracking [21.897255266278275]
We propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out of view or affected by motion blur in RGB video sequences.
To the best of our knowledge, we are the first to propose a depth attention mechanism and to formulate a simple framework that allows seamlessly integration of depth information with state of the art tracking algorithms.
arXiv Detail & Related papers (2024-10-27T09:47:47Z) - FeatureSORT: Essential Features for Effective Tracking [0.0]
We introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective.
By integrating distinct appearance features, including clothing color, style, and target direction, our tracker significantly enhances online tracking accuracy.
arXiv Detail & Related papers (2024-07-05T04:37:39Z) - Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - SparseTrack: Multi-Object Tracking by Performing Scene Decomposition
based on Pseudo-Depth [84.64121608109087]
We propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images.
Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets.
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
arXiv Detail & Related papers (2023-06-08T14:36:10Z) - Self-Supervised Learning based Depth Estimation from Monocular Images [0.0]
The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input.
We plan to do intrinsic camera parameters during training and apply weather augmentations to further generalize our model.
arXiv Detail & Related papers (2023-04-14T07:14:08Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking [47.59619420444781]
Approaches to monocular 3D perception including detection and tracking often yield inferior performance when compared to LiDAR-based techniques.
We propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation.
arXiv Detail & Related papers (2022-06-08T03:37:59Z) - RGBD Object Tracking: An In-depth Review [89.96221353160831]
We firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking framework.
We benchmark a representative set of RGBD trackers, and give detailed analyses based on their performances.
arXiv Detail & Related papers (2022-03-26T18:53:51Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.