Related papers: Fast-SAM3D: 3Dfy Anything in Images but Faster

Fast-SAM3D: 3Dfy Anything in Images but Faster

URL: http://arxiv.org/abs/2602.05293v1
Date: Thu, 05 Feb 2026 04:27:59 GMT
Title: Fast-SAM3D: 3Dfy Anything in Images but Faster
Authors: Weilun Feng, Mingqiang Wu, Zhiliang Chen, Chuanguang Yang, Haotong Qin, Yuqi Li, Xiaokun Liu, Guoxin Fan, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu,
Abstract summary: SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency.<n>We present textbfFast-SAM3D, a training-free framework that aligns computation with instantaneous generation complexity.
Score: 65.17322167628367
License: http://creativecommons.org/licenses/by/4.0/
Abstract: SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the \textbf{first systematic investigation} into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level \textbf{heterogeneity}: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present \textbf{Fast-SAM3D}, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) \textit{Modality-Aware Step Caching} to decouple structural evolution from sensitive layout updates; (2) \textit{Joint Spatiotemporal Token Carving} to concentrate refinement on high-entropy regions; and (3) \textit{Spectral-Aware Token Aggregation} to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to \textbf{2.67$\times$} end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released in https://github.com/wlfeng0509/Fast-SAM3D.

Related papers

StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z)
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion [78.20778143251171]
WorldWarp is a framework that couples a 3D structural anchor with a 2D generative refiner.<n>WorldWarp maintains consistency across video chunks by dynamically updating the 3D cache at every step.<n>It achieves state-of-the-art fidelity by ensuring that 3D logic guides structure while diffusion logic perfects texture.
arXiv Detail & Related papers (2025-12-22T18:53:50Z)
Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration [16.87269278147738]
We propose Fast3Dcache, a training-free geometry-aware caching framework for 3D diffusion inference.<n>Our method achieves up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%)
arXiv Detail & Related papers (2025-11-27T15:13:32Z)
LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning [26.88556500272625]
We propose a novel 3D variational autoencoder framework built upon unsigned distance fields (UDFs)<n>Our core innovation is a local-to-global architecture that processes the UDF by partitioning it into uniform subvolumes, UBlocks.<n> Experiments demonstrate state-of-the-art performance in both reconstruction accuracy and generative quality, yielding superior surface smoothness and geometric flexibility.
arXiv Detail & Related papers (2025-11-13T07:34:43Z)
UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction [26.278318116942526]
We present UniSplat, a feed-forward framework that learns robust dynamic scene reconstruction through unified latent-temporal fusion.<n>Experiments on real-world datasets demonstrate that UniSplat achieves state-of-the-art synthesis in novel view, while providing robust and high-quality renderings for viewpoints outside the original camera coverage.
arXiv Detail & Related papers (2025-11-06T17:49:39Z)
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z)
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention [54.15345846343084]
We propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality.<n>Part Attention is a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions.<n>Experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
arXiv Detail & Related papers (2025-07-23T17:57:16Z)
STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering [15.873329633980015]
Existing 3DGS-based methods for dynamic reconstruction often suffer from textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering)<n>We propose textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering), a plug-and-play module learns thattemporal probability distributions for each scene.
arXiv Detail & Related papers (2025-05-28T14:26:41Z)
EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation [14.402479944396665]
We introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to align with the target frame, and then refines it with minimal point addition/subtraction.<n> Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics.<n>Our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.
arXiv Detail & Related papers (2025-03-07T06:01:07Z)
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.<n>We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.