Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement
- URL: http://arxiv.org/abs/2511.18672v1
- Date: Mon, 24 Nov 2025 01:09:23 GMT
- Title: Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement
- Authors: Yuchen Xia, Souvik Kundu, Mosharaf Chowdhury, Nishil Talati,
- Abstract summary: We present Sphinx, a training-free hybrid inference framework that achieves diffusion-level fidelity at a significantly lower compute.<n>Sphinx achieves an average 1.8x speedup over diffusion model inference with negligible perceptual degradation of less than 5%.
- Score: 9.67064002183396
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Novel View Synthesis (NVS) is the task of generating new images of a scene from viewpoints that were not part of the original input. Diffusion-based NVS can generate high-quality, temporally consistent images, however, remains computationally prohibitive. Conversely, regression-based NVS offers suboptimal generation quality despite requiring significantly lower compute; leaving the design objective of a high-quality, inference-efficient NVS framework an open challenge. To close this critical gap, we present Sphinx, a training-free hybrid inference framework that achieves diffusion-level fidelity at a significantly lower compute. Sphinx proposes to use regression-based fast initialization to guide and reduce the denoising workload for the diffusion model. Additionally, it integrates selective refinement with adaptive noise scheduling, allowing more compute to uncertain regions and frames. This enables Sphinx to provide flexible navigation of the performance-quality trade-off, allowing adaptation to latency and fidelity requirements for dynamically changing inference scenarios. Our evaluation shows that Sphinx achieves an average 1.8x speedup over diffusion model inference with negligible perceptual degradation of less than 5%, establishing a new Pareto frontier between quality and latency in NVS serving.
Related papers
- BADiff: Bandwidth Adaptive Diffusion Model [55.10134744772338]
Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations.<n>In practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation.<n>We introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth.
arXiv Detail & Related papers (2025-10-24T11:50:03Z) - Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance [52.705112811734566]
A novel diffusion-based framework is introduced for solving inverse problems using a piecewise guidance scheme.<n>The proposed method is problem-agnostic and readily adaptable to a variety of inverse problems.<n>The framework achieves a reduction in inference time of (25%) for inpainting with both random and center masks, and (23%) and (24%) for (4times) and (8times) super-resolution tasks.
arXiv Detail & Related papers (2025-07-22T19:35:14Z) - EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models [57.059991285047296]
hybrid edge-cloud collaborative framework was recently proposed to realize fast inference and high-quality generation.<n>Excessive cloud denoising prolongs inference time, while insufficient steps cause semantic ambiguity, leading to inconsistency in edge model output.<n>We propose EC-Diff that accelerates cloud inference through gradient-based noise estimation.<n>Our method significantly enhances generation quality compared to edge inference, while achieving up to an average $2times$ speedup in inference compared to cloud inference.
arXiv Detail & Related papers (2025-07-16T07:23:14Z) - FxTS-Net: Fixed-Time Stable Learning Framework for Neural ODEs [0.48123217909844934]
We propose a new method for training Neural ODEs using fixed-time stability (FxTS) Lyapunov conditions.
Our framework, called FxTS-Net, is based on the novel FxTS loss (FxTS-Loss) designed on Lyapunov functions.
We find that FxTS-Net provides better prediction performance and better robustness under input perturbation.
arXiv Detail & Related papers (2024-11-14T01:37:24Z) - Spatial Annealing for Efficient Few-shot Neural Rendering [73.49548565633123]
We introduce an accurate and efficient few-shot neural rendering method named textbfSpatial textbfAnnealing regularized textbfNeRF (textbfSANeRF)<n>By adding merely one line of code, SANeRF delivers superior rendering quality and much faster reconstruction speed compared to current few-shot neural rendering methods.
arXiv Detail & Related papers (2024-06-12T02:48:52Z) - SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance [136.15885067858298]
This study presents a novel feature-matching-based sparse geometry regularization module, enhanced by a spatially consistent geometry filtering mechanism and a frequency-guided geometric regularization strategy.<n>Our experiments demonstrate that SGCNeRF achieves superior geometry-consistent outcomes and also surpasses FreeNeRF, with improvements of 0.7 dB in PSNR on LLFF and DTU.
arXiv Detail & Related papers (2024-04-01T08:37:57Z) - FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models [16.940023904740585]
We introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models.
Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality.
arXiv Detail & Related papers (2023-12-06T14:24:26Z) - VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations [25.88881764546414]
VQ-NeRF is an efficient pipeline for enhancing implicit neural representations via vector quantization.
We present an innovative multi-scale NeRF sampling scheme that concurrently optimize the NeRF model at both compressed and original scales.
We incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions.
arXiv Detail & Related papers (2023-10-23T01:41:38Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - NeRF in detail: Learning to sample for view synthesis [104.75126790300735]
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis.
In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a performance and not trained end-to-end for the task at hand.
We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture.
arXiv Detail & Related papers (2021-06-09T17:59:10Z) - Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with
Delayed Gradients [21.63719641718363]
Federated learning (FL) is a new machine learning framework which trains a joint model across a large amount of decentralized computing devices.
This paper presents a novel FL algorithm, namely Hybrid Federated Learning (HFL), to achieve a learning balance in efficiency and effectiveness.
arXiv Detail & Related papers (2021-02-12T02:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.