Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design
- URL: http://arxiv.org/abs/2510.25314v1
- Date: Wed, 29 Oct 2025 09:27:38 GMT
- Title: Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design
- Authors: Zongxi Yu, Xiaolong Qian, Shaohua Gao, Qi Jiang, Yao Gao, Kailun Yang, Kaiwei Wang,
- Abstract summary: We introduce a novel bio-inspired all-spherical monocentric lens, around which we build the Bionic Monocentric Imaging (BMI) framework.<n>This optical design naturally encodes depth into its depth-varying Point Spread Functions (PSFs) without requiring complex diffractive or freeform elements.<n>In depth estimation, the method attains an Abs Rel of 0.026 and an RMSE of 0.130, markedly outperforming leading software-only approaches.
- Score: 17.925866379655208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving high-fidelity, compact RGBD imaging presents a dual challenge: conventional compact optics struggle with RGB sharpness across the entire depth-of-field, while software-only Monocular Depth Estimation (MDE) is an ill-posed problem reliant on unreliable semantic priors. While deep optics with elements like DOEs can encode depth, they introduce trade-offs in fabrication complexity and chromatic aberrations, compromising simplicity. To address this, we first introduce a novel bio-inspired all-spherical monocentric lens, around which we build the Bionic Monocentric Imaging (BMI) framework, a holistic co-design. This optical design naturally encodes depth into its depth-varying Point Spread Functions (PSFs) without requiring complex diffractive or freeform elements. We establish a rigorous physically-based forward model to generate a synthetic dataset by precisely simulating the optical degradation process. This simulation pipeline is co-designed with a dual-head, multi-scale reconstruction network that employs a shared encoder to jointly recover a high-fidelity All-in-Focus (AiF) image and a precise depth map from a single coded capture. Extensive experiments validate the state-of-the-art performance of the proposed framework. In depth estimation, the method attains an Abs Rel of 0.026 and an RMSE of 0.130, markedly outperforming leading software-only approaches and other deep optics systems. For image restoration, the system achieves an SSIM of 0.960 and a perceptual LPIPS score of 0.082, thereby confirming a superior balance between image fidelity and depth accuracy. This study illustrates that the integration of bio-inspired, fully spherical optics with a joint reconstruction algorithm constitutes an effective strategy for addressing the intrinsic challenges in high-performance compact RGBD imaging. Source code will be publicly available at https://github.com/ZongxiYu-ZJU/BMI.
Related papers
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z) - S2ML: Spatio-Spectral Mutual Learning for Depth Completion [56.26679539288063]
Raw depth images captured by RGB-D cameras often suffer from incomplete depth values due to weak reflections, boundary shadows, and artifacts.<n>Existing methods address this problem through depth completion in the image domain, but they overlook the physical characteristics of raw depth images.<n>We propose a Spatio-Spectral Mutual Learning framework (S2ML) to harmonize the advantages of both spatial and frequency domains for depth completion.
arXiv Detail & Related papers (2025-11-08T15:01:55Z) - PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion [61.6340987158734]
We present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth.<n> PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics.<n>We show that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks.
arXiv Detail & Related papers (2025-09-30T09:38:59Z) - Learned Off-aperture Encoding for Wide Field-of-view RGBD Imaging [31.931929519577402]
This work explores an additional design choice by positioning a DOE off-aperture, enabling a spatial unmixing of the degrees of freedom.<n> Experimental results reveal that the off-aperture DOE enhances the imaging quality by over 5 dB in PSNR at a FoV of approximately $45circ$ when paired with a simple thin lens.
arXiv Detail & Related papers (2025-07-30T09:49:47Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Aperture Diffraction for Compact Snapshot Spectral Imaging [27.321750056840706]
We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS)
A new optical design that each point in the object space is multiplexed to discrete encoding locations on the mosaic filter sensor is introduced.
The Cascade Shift-Shuffle Spectral Transformer (CSST) with strong perception of the diffraction degeneration is designed to solve a sparsity-constrained inverse problem.
arXiv Detail & Related papers (2023-09-27T16:48:46Z) - End-to-end Learning for Joint Depth and Image Reconstruction from
Diffracted Rotation [10.896567381206715]
We propose a novel end-to-end learning approach for depth from diffracted rotation.
Our approach requires a significantly less complex model and less training data, yet it is superior to existing methods in the task of monocular depth estimation.
arXiv Detail & Related papers (2022-04-14T16:14:37Z) - Universal and Flexible Optical Aberration Correction Using Deep-Prior
Based Deconvolution [51.274657266928315]
We propose a PSF aware plug-and-play deep network, which takes the aberrant image and PSF map as input and produces the latent high quality version via incorporating lens-specific deep priors.
Specifically, we pre-train a base model from a set of diverse lenses and then adapt it to a given lens by quickly refining the parameters.
arXiv Detail & Related papers (2021-04-07T12:00:38Z) - Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein
GANs [1.0499611180329802]
Real-time estimation of actual environment depth is an essential module for various autonomous system tasks.
In this study, latest advancements in the field of generative neural networks are leveraged to fully unsupervised single-image depth synthesis.
arXiv Detail & Related papers (2021-03-31T09:43:38Z) - Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics [72.9038524082252]
We propose a compact single-shot monocular hyperspectral-depth (HS-D) imaging method.
Our method uses a diffractive optical element (DOE), the point spread function of which changes with respect to both depth and spectrum.
To facilitate learning the DOE, we present a first HS-D dataset by building a benchtop HS-D imager.
arXiv Detail & Related papers (2020-09-01T14:19:35Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.