GlocalFuse-Depth: Fusing Transformers and CNNs for All-day
Self-supervised Monocular Depth Estimation
- URL: http://arxiv.org/abs/2302.09884v1
- Date: Mon, 20 Feb 2023 10:20:07 GMT
- Title: GlocalFuse-Depth: Fusing Transformers and CNNs for All-day
Self-supervised Monocular Depth Estimation
- Authors: Zezheng Zhang, Ryan K. Y. Chan and Kenneth K. Y. Wong
- Abstract summary: We propose a two-branch network named GlocalFuse-Depth for self-supervised depth estimation of all-day images.
GlocalFuse-Depth achieves state-of-the-art results for all-day images on the Oxford RobotCar dataset.
- Score: 0.12891210250935148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, self-supervised monocular depth estimation has drawn much
attention since it frees of depth annotations and achieved remarkable results
on standard benchmarks. However, most of existing methods only focus on either
daytime or nighttime images, thus their performance degrades on the other
domain because of the large domain shift between daytime and nighttime images.
To address this problem, in this paper we propose a two-branch network named
GlocalFuse-Depth for self-supervised depth estimation of all-day images. The
daytime and nighttime image in input image pair are fed into the two branches:
CNN branch and Transformer branch, respectively, where both fine-grained
details and global dependency can be efficiently captured. Besides, a novel
fusion module is proposed to fuse multi-dimensional features from the two
branches. Extensive experiments demonstrate that GlocalFuse-Depth achieves
state-of-the-art results for all-day images on the Oxford RobotCar dataset,
which proves the superiority of our method.
Related papers
- Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation [58.180226179087086]
We propose a novel end-to-end optimized approach, named NightFormer, tailored for night-time semantic segmentation.
Specifically, we design a pixel-level texture enhancement module to acquire texture-aware features hierarchically with phase enhancement and amplified attention.
Our proposed method performs favorably against state-of-the-art night-time semantic segmentation methods.
arXiv Detail & Related papers (2024-08-25T13:59:31Z) - M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation [22.018059988585403]
M$2$Depth is designed to predict reliable scale-aware surrounding depth in autonomous driving.
We first construct cost volumes in spatial and temporal domains individually.
We propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation.
arXiv Detail & Related papers (2024-05-03T11:06:37Z) - The Second Monocular Depth Estimation Challenge [93.1678025923996]
The second edition of the Monocular Depth Estimation Challenge (MDEC) was open to methods using any form of supervision.
The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth.
The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%.
arXiv Detail & Related papers (2023-04-14T11:10:07Z) - STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth
Estimation [12.392842482031558]
We propose a method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task.
Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy.
We benchmark the method on two established datasets: nuScenes and RobotCar.
arXiv Detail & Related papers (2023-02-02T18:59:47Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Self-supervised Monocular Depth Estimation for All Day Images using
Domain Separation [17.066753214406525]
We propose a domain-separated network for self-supervised depth estimation of all-day images.
Our approach achieves state-of-the-art depth estimation results for all-day images on the challenging Oxford RobotCar dataset.
arXiv Detail & Related papers (2021-08-17T13:52:19Z) - Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular
Depth Estimation in the Dark [20.66405067066299]
We introduce Priors-Based Regularization to learn distribution knowledge from unpaired depth maps.
We also leverage Mapping-Consistent Image Enhancement module to enhance image visibility and contrast.
Our framework achieves remarkable improvements and state-of-the-art results on two nighttime datasets.
arXiv Detail & Related papers (2021-08-09T06:24:35Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Unsupervised Monocular Depth Estimation for Night-time Images using
Adversarial Domain Feature Adaptation [17.067988025947024]
We look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images.
The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images.
We propose to solve this problem by posing it as a domain adaptation problem where a network trained with day-time images is adapted to work for night-time images.
arXiv Detail & Related papers (2020-10-03T17:55:16Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.