Self-distilled Feature Aggregation for Self-supervised Monocular Depth
Estimation
- URL: http://arxiv.org/abs/2209.07088v1
- Date: Thu, 15 Sep 2022 07:00:52 GMT
- Title: Self-distilled Feature Aggregation for Self-supervised Monocular Depth
Estimation
- Authors: Zhengming Zhou and Qiulei Dong
- Abstract summary: We propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features.
We propose an SDFA-based network for self-supervised monocular depth estimation, and design a self-distilled training strategy to train the proposed network.
Experimental results on the KITTI dataset demonstrate that the proposed method outperforms the comparative state-of-the-art methods in most cases.
- Score: 11.929584800629673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised monocular depth estimation has received much attention
recently in computer vision. Most of the existing works in literature aggregate
multi-scale features for depth prediction via either straightforward
concatenation or element-wise addition, however, such feature aggregation
operations generally neglect the contextual consistency between multi-scale
features. Addressing this problem, we propose the Self-Distilled Feature
Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale
and high-scale features and maintaining their contextual consistency. The SDFA
employs three branches to learn three feature offset maps respectively: one
offset map for refining the input low-scale feature and the other two for
refining the input high-scale feature under a designed self-distillation
manner. Then, we propose an SDFA-based network for self-supervised monocular
depth estimation, and design a self-distilled training strategy to train the
proposed network with the SDFA module. Experimental results on the KITTI
dataset demonstrate that the proposed method outperforms the comparative
state-of-the-art methods in most cases. The code is available at
https://github.com/ZM-Zhou/SDFA-Net_pytorch.
Related papers
- Self-supervised Monocular Depth Estimation with Large Kernel Attention [30.44895226042849]
We propose a self-supervised monocular depth estimation network to get finer details.
Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies.
Our method achieves competitive results on the KITTI dataset.
arXiv Detail & Related papers (2024-09-26T14:44:41Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative
Convolution Network [80.19054069988559]
We find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency.
We propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth representation in two aspects.
Experiments show that our method achieves significant improvements on three widely used benchmarks.
arXiv Detail & Related papers (2023-08-10T14:32:18Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Embedded Self-Distillation in Compact Multi-Branch Ensemble Network for
Remote Sensing Scene Classification [17.321718779142817]
We propose a multi-branch ensemble network to enhance the feature representation ability.
We embed self-distillation (SD) method to transfer knowledge from ensemble network to main-branch in it.
Results prove that our proposed ESD-MBENet can achieve better accuracy than previous state-of-the-art (SOTA) complex models.
arXiv Detail & Related papers (2021-04-01T03:08:52Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z) - Monocular 3D Object Detection with Sequential Feature Association and
Depth Hint Augmentation [12.55603878441083]
FADNet is presented to address the task of monocular 3D object detection.
A dedicated depth hint module is designed to generate row-wise features named as depth hints.
The contributions of this work are validated by conducting experiments and ablation study on the KITTI benchmark.
arXiv Detail & Related papers (2020-11-30T07:19:14Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.