PCDepth: Pattern-based Complementary Learning for Monocular Depth
Estimation by Best of Both Worlds
- URL: http://arxiv.org/abs/2402.18925v1
- Date: Thu, 29 Feb 2024 07:31:59 GMT
- Title: PCDepth: Pattern-based Complementary Learning for Monocular Depth
Estimation by Best of Both Worlds
- Authors: Haotian Liu, Sanqing Qu, Fan Lu, Zongtao Bu, Florian Roehrbein, Alois
Knoll, Guang Chen
- Abstract summary: Event cameras record scene dynamics with high temporal resolution, providing rich scene details for monocular depth estimation.
Existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding.
We propose a Pattern-based Complementary learning architecture for monocular Depth estimation (PCDepth)
- Score: 15.823230141827358
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Event cameras can record scene dynamics with high temporal resolution,
providing rich scene details for monocular depth estimation (MDE) even at
low-level illumination. Therefore, existing complementary learning approaches
for MDE fuse intensity information from images and scene details from event
data for better scene understanding. However, most methods directly fuse two
modalities at pixel level, ignoring that the attractive complementarity mainly
impacts high-level patterns that only occupy a few pixels. For example, event
data is likely to complement contours of scene objects. In this paper, we
discretize the scene into a set of high-level patterns to explore the
complementarity and propose a Pattern-based Complementary learning architecture
for monocular Depth estimation (PCDepth). Concretely, PCDepth comprises two
primary components: a complementary visual representation learning module for
discretizing the scene into high-level patterns and integrating complementary
patterns across modalities and a refined depth estimator aimed at scene
reconstruction and depth prediction while maintaining an efficiency-accuracy
balance. Through pattern-based complementary learning, PCDepth fully exploits
two modalities and achieves more accurate predictions than existing methods,
especially in challenging nighttime scenarios. Extensive experiments on MVSEC
and DSEC datasets verify the effectiveness and superiority of our PCDepth.
Remarkably, compared with state-of-the-art, PCDepth achieves a 37.9%
improvement in accuracy in MVSEC nighttime scenarios.
Related papers
- BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation [25.047835960649167]
BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning.
By efficient training on small-scale synthetic datasets, BetterDepth achieves state-of-the-art zero-shot MDE performance.
BetterDepth can improve the performance of other MDE models in a plug-and-play manner without additional re-training.
arXiv Detail & Related papers (2024-07-25T11:16:37Z) - Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation [31.34615135846137]
We propose a few-shot-based method which learns to adapt the Vision-Language Models for monocular depth estimation.
Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference.
With only one image per scene for training, our extensive experiment results on the NYU V2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6% in terms of MARE.
arXiv Detail & Related papers (2023-11-02T06:56:50Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Dense Depth Distillation with Out-of-Distribution Simulated Images [30.79756881887895]
We study data-free knowledge distillation (KD) for monocular depth estimation (MDE)
KD learns a lightweight model for real-world depth perception tasks by compressing it from a trained teacher model while lacking training data in the target domain.
We show that our method outperforms the baseline KD by a good margin and even slightly better performance with as few as 1/6 of training images.
arXiv Detail & Related papers (2022-08-26T07:10:01Z) - EdgeConv with Attention Module for Monocular Depth Estimation [4.239147046986999]
To generate accurate depth maps, it is important for the model to learn structural information about the scene.
We propose a novel Patch-Wise EdgeConv Module (PEM) and EdgeConv Attention Module (EAM) to solve the difficulty of monocular depth estimation.
Our method is evaluated on two popular datasets, the NYU Depth V2 and the KITTI split, achieving state-of-the-art performance.
arXiv Detail & Related papers (2021-06-16T08:15:20Z) - Boosting Monocular Depth Estimation Models to High-Resolution via
Content-Adaptive Multi-Resolution Merging [14.279471205248534]
We show how a consistent scene structure and high-frequency details affect depth estimation performance.
We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details.
We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail.
arXiv Detail & Related papers (2021-05-28T17:55:15Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.