G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
- URL: http://arxiv.org/abs/2508.11379v2
- Date: Mon, 29 Sep 2025 07:37:18 GMT
- Title: G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
- Authors: Ramil Khafizov, Artem Komarichev, Ruslan Rakhimov, Peter Wonka, Evgeny Burnaev,
- Abstract summary: We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction.<n>Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions.
- Score: 57.67450930037339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions, commonly available in real-world scenarios. We propose a lightweight modification to CUT3R, incorporating a dedicated encoder for each modality to extract features, which are fused with RGB image tokens via zero convolution. This flexible design enables seamless integration of any combination of prior information during inference. Evaluated across multiple benchmarks, including 3D reconstruction and other multi-view tasks, our approach demonstrates significant performance improvements, showing its ability to effectively utilize available priors while maintaining compatibility with varying input modalities.
Related papers
- TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment [58.46706158310462]
TIGaussian harnesses 3D Gaussian Splatting (3DGS) characteristics to strengthen cross-modality alignment.<n>Our multi-branch 3DGS tokenizer decouples the intrinsic properties of 3DGS structures into compact latent representations.<n>A text-3D projection module adaptively maps 3D features to text embedding space for better text-3D alignment.
arXiv Detail & Related papers (2026-01-27T06:30:32Z) - C3G: Learning Compact 3D Representations with 2K Gaussians [55.04010158339562]
Recent approaches use per-pixel 3D Gaussian Splatting for reconstruction, followed by a 2D-to-3D feature lifting stage for scene understanding.<n>We propose C3G, a novel feed-forward framework that estimates compact 3D Gaussians only at essential spatial locations.
arXiv Detail & Related papers (2025-12-03T17:59:05Z) - RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions [67.48495052903534]
We propose a general and efficient multi-view feature enhancement module, RobustGS.<n>It substantially improves the robustness of feedforward 3DGS methods under various adverse imaging conditions.<n>The RobustGS module can be seamlessly integrated into existing pretrained pipelines in a plug-and-play manner.
arXiv Detail & Related papers (2025-08-05T04:50:29Z) - Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors [18.149244316089284]
We present Pow3r, a novel large 3D vision regression model that is highly versatile in the input modalities it accepts.<n>Our experiments on 3D reconstruction, depth completion, multi-view depth prediction, multi-view stereo, and multi-view pose estimation tasks yield state-of-the-art results.
arXiv Detail & Related papers (2025-03-21T17:12:30Z) - Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning [28.80962812015936]
Imitation Learning can train robots to perform complex and diverse manipulation tasks, but learned policies are brittle with observations outside of the training distribution.<n>We propose Adapt3R, a general-purpose 3D observation encoder which synthesizes data from calibrated RGBD cameras into a vector that can be used as conditioning for arbitrary IL algorithms.<n>We show across 93 simulated and 6 real tasks that when trained end-to-end with a variety of IL algorithms, Adapt3R maintains these algorithms' learning capacity while enabling zero-shot transfer to novel embodiments and camera poses.
arXiv Detail & Related papers (2025-03-06T18:17:09Z) - PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [3.61512056914095]
We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length.
PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images.
arXiv Detail & Related papers (2024-11-25T19:16:29Z) - vFusedSeg3D: 3rd Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation [0.0]
VFusedSeg3D uses the rich semantic content of the camera pictures and the accurate depth sensing of LiDAR to generate a strong and comprehensive environmental understanding.
Our novel feature fusion technique combines geometric features from LiDAR point clouds with semantic features from camera images.
With the use of multi-modality techniques, performance has significantly improved, yielding a state-of-the-art mIoU of 72.46% on the validation set.
arXiv Detail & Related papers (2024-08-09T11:34:19Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [60.48134767838629]
We present a novel 3D detection framework named AnyView for our practical applications.<n>Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Learning Efficient Photometric Feature Transform for Multi-view Stereo [37.26574529243778]
We learn to convert the perpixel photometric information at each view into spatially distinctive and view-invariant low-level features.
Our framework automatically adapts to and makes efficient use of the geometric information available in different forms of input data.
arXiv Detail & Related papers (2021-03-27T02:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.