Related papers: Image Generation with a Sphere Encoder

Image Generation with a Sphere Encoder

URL: http://arxiv.org/abs/2602.15030v1
Date: Mon, 16 Feb 2026 18:59:57 GMT
Title: Image Generation with a Sphere Encoder
Authors: Kaiyu Yue, Menglin Jia, Ji Hou, Tom Goldstein,
Abstract summary: Sphere is an efficient generative framework capable of producing images in a single forward pass.<n>Our approach works by learning an encoder that maps uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space.
Score: 52.086777706390706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

Related papers

Geometry-Preserving Encoder/Decoder in Latent Generative Models [15.766401356353084]
We introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE.<n>We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder.
arXiv Detail & Related papers (2025-01-16T23:14:34Z)
SCALAR-NeRF: SCAlable LARge-scale Neural Radiance Fields for Scene Reconstruction [66.69049158826677]
We introduce SCALAR-NeRF, a novel framework tailored for scalable large-scale neural scene reconstruction. We structure the neural representation as an encoder-decoder architecture, where the encoder processes 3D point coordinates to produce encoded features. We propose an effective and efficient methodology to fuse the outputs from these local models to attain the final reconstruction.
arXiv Detail & Related papers (2023-11-28T10:18:16Z)
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation [92.55296042611886]
We propose a framework called "Reuse and Diffuse" dubbed $textitVidRD$ to produce more frames following the frames already generated by an LDM. We also propose a set of strategies for composing video-text data that involve diverse content from multiple existing datasets.
arXiv Detail & Related papers (2023-09-07T08:12:58Z)
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm. We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z)
Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret. In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC) Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z)
Spherical Image Inpainting with Frame Transformation and Data-driven Prior Deep Networks [13.406134708071345]
In this work, we focus on the challenging task of spherical image inpainting with deep learning-based regularizer. We employ a fast directional spherical Haar framelet transform and develop a novel optimization framework based on a sparsity assumption of the framelet transform. We show that the proposed algorithms can greatly recover damaged spherical images and achieve the best performance over purely using deep learning denoiser and plug-and-play model.
arXiv Detail & Related papers (2022-09-29T07:51:27Z)
Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z)
Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.