Wid3R: Wide Field-of-View 3D Reconstruction via Camera Model Conditioning
- URL: http://arxiv.org/abs/2602.05321v1
- Date: Thu, 05 Feb 2026 05:42:03 GMT
- Title: Wid3R: Wide Field-of-View 3D Reconstruction via Camera Model Conditioning
- Authors: Dongki Jung, Jaehoon Choi, Adil Qureshi, Somi Jeong, Dinesh Manocha, Suyong Yeon,
- Abstract summary: We present Wid3R, a feed-forward neural network for visual geometry reconstruction that supports wide field-of-view camera models.<n>Our approach leverages a ray representation with spherical harmonics and a novel camera model token within the network, enabling distortion-aware 3D reconstruction.<n>It demonstrates strong zero-shot robustness and consistently outperforms prior methods, achieving improvements of up to +77.33 on Stanford2D3D.
- Score: 46.85601730573671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Wid3R, a feed-forward neural network for visual geometry reconstruction that supports wide field-of-view camera models. Prior methods typically assume that input images are rectified or captured with pinhole cameras, since both their architectures and training datasets are tailored to perspective images only. These assumptions limit their applicability in real-world scenarios that use fisheye or panoramic cameras and often require careful calibration and undistortion. In contrast, Wid3R is a generalizable multi-view 3D estimation method that can model wide field-of-view camera types. Our approach leverages a ray representation with spherical harmonics and a novel camera model token within the network, enabling distortion-aware 3D reconstruction. Furthermore, Wid3R is the first multi-view foundation model to support feed-forward 3D reconstruction directly from 360 imagery. It demonstrates strong zero-shot robustness and consistently outperforms prior methods, achieving improvements of up to +77.33 on Stanford2D3D.
Related papers
- PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos [25.79551555341372]
PAD3R is a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos.<n>At its core, our approach trains a personalized, object-centric pose estimator, supervised by a pre-trained image-to-3D model.<n>By combining generative priors and differentiable rendering, PAD3R reconstructs high-fidelity, articulated 3D representations of objects in a category-agnostic way.
arXiv Detail & Related papers (2025-09-29T17:59:33Z) - Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors [18.149244316089284]
We present Pow3r, a novel large 3D vision regression model that is highly versatile in the input modalities it accepts.<n>Our experiments on 3D reconstruction, depth completion, multi-view depth prediction, multi-view stereo, and multi-view pose estimation tasks yield state-of-the-art results.
arXiv Detail & Related papers (2025-03-21T17:12:30Z) - UniK3D: Universal Camera Monocular 3D Estimation [62.06785782635153]
We present UniK3D, the first generalizable method for monocular 3D estimation able to model any camera.<n>Our method introduces a spherical 3D representation which allows for better disentanglement of camera and scene geometry.<n>A comprehensive zero-shot evaluation on 13 diverse datasets demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and camera metrics.
arXiv Detail & Related papers (2025-03-20T17:49:23Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.<n>We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.<n>Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.