Related papers: GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation

GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation

URL: http://arxiv.org/abs/2302.09884v1
Date: Mon, 20 Feb 2023 10:20:07 GMT
Title: GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation
Authors: Zezheng Zhang, Ryan K. Y. Chan and Kenneth K. Y. Wong
Abstract summary: We propose a two-branch network named GlocalFuse-Depth for self-supervised depth estimation of all-day images. GlocalFuse-Depth achieves state-of-the-art results for all-day images on the Oxford RobotCar dataset.
Score: 0.12891210250935148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, self-supervised monocular depth estimation has drawn much attention since it frees of depth annotations and achieved remarkable results on standard benchmarks. However, most of existing methods only focus on either daytime or nighttime images, thus their performance degrades on the other domain because of the large domain shift between daytime and nighttime images. To address this problem, in this paper we propose a two-branch network named GlocalFuse-Depth for self-supervised depth estimation of all-day images. The daytime and nighttime image in input image pair are fed into the two branches: CNN branch and Transformer branch, respectively, where both fine-grained details and global dependency can be efficiently captured. Besides, a novel fusion module is proposed to fuse multi-dimensional features from the two branches. Extensive experiments demonstrate that GlocalFuse-Depth achieves state-of-the-art results for all-day images on the Oxford RobotCar dataset, which proves the superiority of our method.

Related papers

FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability. We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF) PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models. FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation [58.180226179087086]
We propose a novel end-to-end optimized approach, named NightFormer, tailored for night-time semantic segmentation. Specifically, we design a pixel-level texture enhancement module to acquire texture-aware features hierarchically with phase enhancement and amplified attention. Our proposed method performs favorably against state-of-the-art night-time semantic segmentation methods.
arXiv Detail & Related papers (2024-08-25T13:59:31Z)
M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation [22.018059988585403]
M$2$Depth is designed to predict reliable scale-aware surrounding depth in autonomous driving. We first construct cost volumes in spatial and temporal domains individually. We propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation.
arXiv Detail & Related papers (2024-05-03T11:06:37Z)
The Second Monocular Depth Estimation Challenge [93.1678025923996]
The second edition of the Monocular Depth Estimation Challenge (MDEC) was open to methods using any form of supervision. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%.
arXiv Detail & Related papers (2023-04-14T11:10:07Z)
STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation [12.392842482031558]
We propose a method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task. Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy. We benchmark the method on two established datasets: nuScenes and RobotCar.
arXiv Detail & Related papers (2023-02-02T18:59:47Z)
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras. Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views. In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z)
Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation [17.066753214406525]
We propose a domain-separated network for self-supervised depth estimation of all-day images. Our approach achieves state-of-the-art depth estimation results for all-day images on the challenging Oxford RobotCar dataset.
arXiv Detail & Related papers (2021-08-17T13:52:19Z)
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark [20.66405067066299]
We introduce Priors-Based Regularization to learn distribution knowledge from unpaired depth maps. We also leverage Mapping-Consistent Image Enhancement module to enhance image visibility and contrast. Our framework achieves remarkable improvements and state-of-the-art results on two nighttime datasets.
arXiv Detail & Related papers (2021-08-09T06:24:35Z)
Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation [17.067988025947024]
We look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images. We propose to solve this problem by posing it as a domain adaptation problem where a network trained with day-time images is adapted to work for night-time images.
arXiv Detail & Related papers (2020-10-03T17:55:16Z)
DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation. Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.