Related papers: Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

URL: http://arxiv.org/abs/2511.22533v1
Date: Thu, 27 Nov 2025 15:13:32 GMT
Title: Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration
Authors: Mengyu Yang, Yanming Yang, Chenyi Xu, Chenxi Song, Yufan Zuo, Tong Zhao, Ruibo Li, Chi Zhang,
Abstract summary: We propose Fast3Dcache, a training-free geometry-aware caching framework for 3D diffusion inference.<n>Our method achieves up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%)
Score: 16.87269278147738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have achieved impressive generative quality across modalities like 2D images, videos, and 3D shapes, but their inference remains computationally expensive due to the iterative denoising process. While recent caching-based methods effectively reuse redundant computations to speed up 2D and video generation, directly applying these techniques to 3D diffusion models can severely disrupt geometric consistency. In 3D synthesis, even minor numerical errors in cached latent features accumulate, causing structural artifacts and topological inconsistencies. To overcome this limitation, we propose Fast3Dcache, a training-free geometry-aware caching framework that accelerates 3D diffusion inference while preserving geometric fidelity. Our method introduces a Predictive Caching Scheduler Constraint (PCSC) to dynamically determine cache quotas according to voxel stabilization patterns and a Spatiotemporal Stability Criterion (SSC) to select stable features for reuse based on velocity magnitude and acceleration criterion. Comprehensive experiments show that Fast3Dcache accelerates inference significantly, achieving up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%).

Related papers

Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers [11.772150619675527]
Diffusion Transformers (DiTs) have emerged as the dominant architecture for high-quality image and video generation.<n>Existing caching methods accelerate DiTs by reusing intermediate computations across timesteps, but they share a common limitation: treating the denoising process as uniform across time,depth, and feature dimensions.<n>We propose SpectralCache, a unified caching framework comprising Timestep-Aware Dynamic Scheduling (TADS), Cumulative Error Budgets (CEB), and Frequency-Decomposed Caching (FDC)
arXiv Detail & Related papers (2026-03-05T15:58:06Z)
Fast-SAM3D: 3Dfy Anything in Images but Faster [65.17322167628367]
SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency.<n>We present textbfFast-SAM3D, a training-free framework that aligns computation with instantaneous generation complexity.
arXiv Detail & Related papers (2026-02-05T04:27:59Z)
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models [88.04431808574581]
ITS3D is a framework that formulates the task as an optimization problem to identify the most effective Gaussian noise input.<n>We introduce three techniques for improved stability, efficiency, and exploration capability.<n>Experiments demonstrate that ITS3D enhances text-to-3D generation quality.
arXiv Detail & Related papers (2025-11-27T13:46:16Z)
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation [40.968338980157846]
Training-free acceleration has emerged as an advanced research area in video generation based on diffusion models.<n>In this paper, we decompose the inference process into the encoding, denoising, and decoding stages.<n>We propose stage-specific strategies for reducing memory consumption.
arXiv Detail & Related papers (2025-10-06T20:54:44Z)
Predictive Feature Caching for Training-free Acceleration of Molecular Geometry Generation [67.20779609022108]
Flow matching models generate high-fidelity molecular geometries but incur significant computational costs during inference.<n>This work discusses a training-free caching strategy that accelerates molecular geometry generation.<n> Experiments on the GEOM-Drugs dataset demonstrate that caching achieves a twofold reduction in wall-clock inference time.
arXiv Detail & Related papers (2025-10-06T09:49:14Z)
A Continuous-Time Consistency Model for 3D Point Cloud Generation [0.6308539010172308]
We introduce ConTiCoM-3D, a continuous-time consistency model that synthesizes 3D directly in point space.<n>The method integrates a TrigFlow-inspired continuous noise schedule with a Chamfer Distance-based geometric loss.<n> Experiments on the ShapeNet benchmark show that ConTiCoM-3D matches or outperforms state-of-the-art diffusion and latent consistency models in both quality and efficiency.
arXiv Detail & Related papers (2025-09-01T14:11:59Z)
Ultron: Enabling Temporal Geometry Compression of 3D Mesh Sequences using Temporal Correspondence and Mesh Deformation [2.0914328542137346]
Existing 3D model compression methods primarily focus on static models and do not consider inter-frame information. This paper proposes a method to compress mesh sequences with arbitrary topology using temporal correspondence and mesh deformation.
arXiv Detail & Related papers (2024-09-08T16:34:19Z)
StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances. First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z)
Fast-SNARF: A Fast Deformer for Articulated Neural Fields [92.68788512596254]
We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space. Fast-SNARF is a drop-in replacement in to our previous work, SNARF, while significantly improving its computational efficiency. Because learning of deformation maps is a crucial component in many 3D human avatar methods, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.
arXiv Detail & Related papers (2022-11-28T17:55:34Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.