Related papers: MoE3D: A Mixture-of-Experts Module for 3D Reconstruction

Related papers

PEAR: Pixel-aligned Expressive humAn mesh Recovery [32.39994094033293]
Reconstructing detailed 3D human meshes from a single in-the-wild image remains a fundamental challenge in computer vision.<n>Existing SMPLX-based methods often suffer from slow inference, produce only coarse body poses, and exhibit misalignments or unnatural artifacts in fine-grained regions such as the face and hands.<n>We propose PEAR, a fast and robust framework for pixel-aligned expressive human mesh recovery.
arXiv Detail & Related papers (2026-01-30T08:12:54Z)
Emergent Extreme-View Geometry in 3D Foundation Models [26.53730031068199]
3D foundation models (3DFMs) have recently transformed 3D vision, enabling joint prediction of depths, poses, and point maps directly from images.<n>We study their internal representations and find that 3DFMs exhibit an emergent understanding of extreme-view geometry, despite never being trained for such conditions.<n>We introduce a lightweight alignment scheme that refines their internal 3D representation by tuning only a small subset of backbone bias terms, leaving all decoder heads frozen.<n>We contribute MegaUnScene, a new benchmark of Internet scenes unseen by existing 3DFMs, with dedicated test splits for both relative pose estimation and dense 3D reconstruction
arXiv Detail & Related papers (2025-11-27T18:40:03Z)
MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts [50.37005070020306]
MoRE is a dense 3D visual foundation model based on a Mixture-of-Experts (MoE) architecture.<n>MoRE incorporates a confidence-based depth refinement module that stabilizes and refines geometric estimation.<n>It integrates dense semantic features with globally aligned 3D backbone representations for high-fidelity surface normal prediction.
arXiv Detail & Related papers (2025-10-31T06:54:27Z)
Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z)
An Evaluation of DUSt3R/MASt3R/VGGT 3D Reconstruction on Photogrammetric Aerial Blocks [17.58704237530163]
3D computer vision algorithms continue to advance in handling sparse, unordered image sets.<n>Recently developed foundational models for 3D reconstruction have attracted attention due to their ability to handle very sparse image overlaps.<n>This paper conducts a comprehensive evaluation of the pre-trained DUSt3R/MASt3R/VGGT models on the aerial blocks of the UseGeo dataset for pose estimation and dense 3D reconstruction.
arXiv Detail & Related papers (2025-07-20T03:09:04Z)
FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction [50.534213038479926]
FreeSplat++ is an alternative approach to large-scale indoor whole-scene reconstruction.<n>Our method with depth-regularized per-scene fine-tuning demonstrates substantial improvements in reconstruction accuracy and a notable reduction in training time.
arXiv Detail & Related papers (2025-03-29T06:22:08Z)
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling [79.56581753856452]
SparseFlex is a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to $10243$ directly from rendering losses.<n>By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.
arXiv Detail & Related papers (2025-03-27T17:46:42Z)
HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z)
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models [65.90387371072413]
We introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis.<n>At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views.
arXiv Detail & Related papers (2025-03-03T17:58:33Z)
GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting [14.937297984020821]
We propose a novel clothed human reconstruction method called GaussianBody, based on 3D Gaussian Splatting. Applying the static 3D Gaussian Splatting model to the dynamic human reconstruction problem is non-trivial due to complicated non-rigid deformations and rich cloth details. We show that our method can achieve state-of-the-art photorealistic novel-view rendering results with high-quality details for dynamic clothed human bodies.
arXiv Detail & Related papers (2024-01-18T04:48:13Z)
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model [86.37536249046943]
textbfDMV3D is a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering.
arXiv Detail & Related papers (2023-11-15T18:58:41Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
Exploiting Multiple Priors for Neural 3D Indoor Reconstruction [15.282699095607594]
We propose a novel neural implicit modeling method that leverages multiple regularization strategies to achieve better reconstructions of large indoor environments. Experimental results show that our approach produces state-of-the-art 3D reconstructions in challenging indoor scenarios.
arXiv Detail & Related papers (2023-09-13T15:23:43Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation. We introduce a novel multi-view RGBD dataset captured using a mobile device. We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z)
Monocular 3D Object Reconstruction with GAN Inversion [122.96094885939146]
MeshInversion is a novel framework to improve the reconstruction of textured 3D meshes. It exploits the generative prior of a 3D GAN pre-trained for 3D textured mesh synthesis. Our framework obtains faithful 3D reconstructions with consistent geometry and texture across both observed and unobserved parts.
arXiv Detail & Related papers (2022-07-20T17:47:22Z)
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation [29.430404703883084]
We present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem. The proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner. Comprehensive experiments on benchmarks CMU Panoptic and MuPoTS-3D demonstrate the superior efficiency of the proposed DAS model.
arXiv Detail & Related papers (2022-03-15T07:30:27Z)
DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species. We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.