Related papers: MODEST: Multi-Optics Depth-of-Field Stereo Dataset

MODEST: Multi-Optics Depth-of-Field Stereo Dataset

URL: http://arxiv.org/abs/2511.20853v1
Date: Tue, 25 Nov 2025 20:59:47 GMT
Title: MODEST: Multi-Optics Depth-of-Field Stereo Dataset
Authors: Nisarg K. Trivedi, Vinayak A. Belludi, Li-Yun Wang, Pardis Taghavi, Dante Lok,
Abstract summary: We present the first high-resolution (5472$times$3648px) stereo DSLR dataset with 18000 images.<n>For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22)<n>This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis.
Score: 1.2815904071470705
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.

Related papers

Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All [21.211645353347908]
This paper presents a new dataset for Novel View Synthesis, generated from a high-quality, animated film with stunning realism and intricate detail.<n>Our dataset captures a variety of dynamic scenes, complete with detailed textures, lighting, and motion.<n>It is ideal for training and evaluating cutting-edge 4D scene reconstruction and novel view generation models.
arXiv Detail & Related papers (2025-12-15T18:33:08Z)
Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections [55.248092751290834]
Mirror reflections are common in everyday environments and can provide stereo information within a single capture.<n>We exploit this property by treating the reflection as an auxiliary view and designing a transformation that constructs a physically valid virtual camera.<n>This enables a multi-view stereo setup from a single image, simplifying the imaging process.
arXiv Detail & Related papers (2025-09-24T23:00:22Z)
LuxDiT: Lighting Estimation with Video Diffusion Transformer [66.60450792095901]
Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics.<n>We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input.
arXiv Detail & Related papers (2025-09-03T19:59:20Z)
Efficient Depth- and Spatially-Varying Image Simulation for Defocus Deblur [16.9629875455607]
Deep learning models trained on existing open-source datasets often face domain gaps and do not perform well in real-world settings.<n>We propose an efficient and scalable dataset approach that does not rely on fine-tuning with real-world data.<n>Our method simultaneously models depth-dependent defocus and spatially varying optical aberrations, addressing both computational complexity and the scarcity of high-quality RGB-D datasets.
arXiv Detail & Related papers (2025-07-01T02:03:04Z)
Illuminating Darkness: Learning to Enhance Low-light Images In-the-Wild [47.39277249268179]
We introduce the Low-Light Smartphone dataset (LSD), a large-scale, high-resolution (4K+) dataset collected in the wild.<n>LSD contains 6,425 precisely aligned low and normal-light image pairs, selected from over 8,000 dynamic indoor and outdoor scenes.<n>We propose TFFormer, a hybrid model that encodes luminance and chrominance separately to reduce color-structure entanglement.
arXiv Detail & Related papers (2025-03-10T04:01:56Z)
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields [15.653977591138682]
Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task.<n>We propose a novel approach using polarized reflectance field capture and a comprehensive statistical analysis algorithm.<n>We showcase the captured shapes and reflectance of diverse objects with a wide material range, spanning from highly diffuse to highly glossy.
arXiv Detail & Related papers (2024-12-13T00:39:55Z)
Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation [83.841877607646]
We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation.<n>The dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images.<n>We benchmark leading stereo depth estimation models for both standard and omnidirectional images.
arXiv Detail & Related papers (2024-11-27T13:34:41Z)
Incorporating dense metric depth into neural 3D representations for view synthesis and relighting [25.028859317188395]
In robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled. In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations. We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis.
arXiv Detail & Related papers (2024-09-04T20:21:13Z)
Deep Learning Methods for Calibrated Photometric Stereo and Beyond [86.57469194387264]
Photometric stereo recovers the surface normals of an object from multiple images with varying shading cues. Deep learning methods have shown a powerful ability in the context of photometric stereo against non-Lambertian surfaces.
arXiv Detail & Related papers (2022-12-16T11:27:44Z)
LUCES: A Dataset for Near-Field Point Light Source Photometric Stereo [30.31403197697561]
We introduce LUCES, the first real-world 'dataset for near-fieLd point light soUrCe photomEtric Stereo' of 14 objects of a varying of materials. A device counting 52 LEDs has been designed to lit each object positioned 10 to 30 centimeters away from the camera. We evaluate the performance of the latest near-field Photometric Stereo algorithms on the proposed dataset.
arXiv Detail & Related papers (2021-04-27T12:30:42Z)
Neural Reflectance Fields for Appearance Acquisition [61.542001266380375]
We present Neural Reflectance Fields, a novel deep scene representation that encodes volume density, normal and reflectance properties at any 3D point in a scene. We combine this representation with a physically-based differentiable ray marching framework that can render images from a neural reflectance field under any viewpoint and light.
arXiv Detail & Related papers (2020-08-09T22:04:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.