MODEST: Multi-Optics Depth-of-Field Stereo Dataset
- URL: http://arxiv.org/abs/2511.20853v1
- Date: Tue, 25 Nov 2025 20:59:47 GMT
- Title: MODEST: Multi-Optics Depth-of-Field Stereo Dataset
- Authors: Nisarg K. Trivedi, Vinayak A. Belludi, Li-Yun Wang, Pardis Taghavi, Dante Lok,
- Abstract summary: We present the first high-resolution (5472$times$3648px) stereo DSLR dataset with 18000 images.<n>For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22)<n>This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis.
- Score: 1.2815904071470705
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.
Related papers
- Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All [21.211645353347908]
This paper presents a new dataset for Novel View Synthesis, generated from a high-quality, animated film with stunning realism and intricate detail.<n>Our dataset captures a variety of dynamic scenes, complete with detailed textures, lighting, and motion.<n>It is ideal for training and evaluating cutting-edge 4D scene reconstruction and novel view generation models.
arXiv Detail & Related papers (2025-12-15T18:33:08Z) - Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections [55.248092751290834]
Mirror reflections are common in everyday environments and can provide stereo information within a single capture.<n>We exploit this property by treating the reflection as an auxiliary view and designing a transformation that constructs a physically valid virtual camera.<n>This enables a multi-view stereo setup from a single image, simplifying the imaging process.
arXiv Detail & Related papers (2025-09-24T23:00:22Z) - LuxDiT: Lighting Estimation with Video Diffusion Transformer [66.60450792095901]
Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics.<n>We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input.
arXiv Detail & Related papers (2025-09-03T19:59:20Z) - Efficient Depth- and Spatially-Varying Image Simulation for Defocus Deblur [16.9629875455607]
Deep learning models trained on existing open-source datasets often face domain gaps and do not perform well in real-world settings.<n>We propose an efficient and scalable dataset approach that does not rely on fine-tuning with real-world data.<n>Our method simultaneously models depth-dependent defocus and spatially varying optical aberrations, addressing both computational complexity and the scarcity of high-quality RGB-D datasets.
arXiv Detail & Related papers (2025-07-01T02:03:04Z) - Illuminating Darkness: Learning to Enhance Low-light Images In-the-Wild [47.39277249268179]
We introduce the Low-Light Smartphone dataset (LSD), a large-scale, high-resolution (4K+) dataset collected in the wild.<n>LSD contains 6,425 precisely aligned low and normal-light image pairs, selected from over 8,000 dynamic indoor and outdoor scenes.<n>We propose TFFormer, a hybrid model that encodes luminance and chrominance separately to reduce color-structure entanglement.
arXiv Detail & Related papers (2025-03-10T04:01:56Z) - IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z) - Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields [15.653977591138682]
Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task.<n>We propose a novel approach using polarized reflectance field capture and a comprehensive statistical analysis algorithm.<n>We showcase the captured shapes and reflectance of diverse objects with a wide material range, spanning from highly diffuse to highly glossy.
arXiv Detail & Related papers (2024-12-13T00:39:55Z) - Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation [83.841877607646]
We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation.<n>The dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images.<n>We benchmark leading stereo depth estimation models for both standard and omnidirectional images.
arXiv Detail & Related papers (2024-11-27T13:34:41Z) - Incorporating dense metric depth into neural 3D representations for view synthesis and relighting [25.028859317188395]
In robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled.
In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations.
We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis.
arXiv Detail & Related papers (2024-09-04T20:21:13Z) - Deep Learning Methods for Calibrated Photometric Stereo and Beyond [86.57469194387264]
Photometric stereo recovers the surface normals of an object from multiple images with varying shading cues.
Deep learning methods have shown a powerful ability in the context of photometric stereo against non-Lambertian surfaces.
arXiv Detail & Related papers (2022-12-16T11:27:44Z) - LUCES: A Dataset for Near-Field Point Light Source Photometric Stereo [30.31403197697561]
We introduce LUCES, the first real-world 'dataset for near-fieLd point light soUrCe photomEtric Stereo' of 14 objects of a varying of materials.
A device counting 52 LEDs has been designed to lit each object positioned 10 to 30 centimeters away from the camera.
We evaluate the performance of the latest near-field Photometric Stereo algorithms on the proposed dataset.
arXiv Detail & Related papers (2021-04-27T12:30:42Z) - Neural Reflectance Fields for Appearance Acquisition [61.542001266380375]
We present Neural Reflectance Fields, a novel deep scene representation that encodes volume density, normal and reflectance properties at any 3D point in a scene.
We combine this representation with a physically-based differentiable ray marching framework that can render images from a neural reflectance field under any viewpoint and light.
arXiv Detail & Related papers (2020-08-09T22:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.