Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation
- URL: http://arxiv.org/abs/2505.12486v1
- Date: Sun, 18 May 2025 16:19:27 GMT
- Title: Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation
- Authors: Sangmin Jung, Utkarsh Nath, Yezhou Yang, Giulia Pedrielli, Joydeep Biswas, Amy Zhang, Hassan Ghasemzadeh, Pavan Turaga,
- Abstract summary: We introduce Deep Geometric Moments (DGM) as a novel form of guidance that encapsulates the subject's visual features and nuances through a learned geometric prior.<n>Our experiments demonstrate that DGM effectively balance control and diversity in diffusion-based image generation, allowing a flexible control mechanism for steering the diffusion process.
- Score: 35.428991756584935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generation models have achieved remarkable capabilities in synthesizing images, but often struggle to provide fine-grained control over the output. Existing guidance approaches, such as segmentation maps and depth maps, introduce spatial rigidity that restricts the inherent diversity of diffusion models. In this work, we introduce Deep Geometric Moments (DGM) as a novel form of guidance that encapsulates the subject's visual features and nuances through a learned geometric prior. DGMs focus specifically on the subject itself compared to DINO or CLIP features, which suffer from overemphasis on global image features or semantics. Unlike ResNets, which are sensitive to pixel-wise perturbations, DGMs rely on robust geometric moments. Our experiments demonstrate that DGM effectively balance control and diversity in diffusion-based image generation, allowing a flexible control mechanism for steering the diffusion process.
Related papers
- GASS: Geometry-Aware Spherical Sampling for Disentangled Diversity Enhancement in Text-to-Image Generation [32.63174739701972]
Despite high semantic alignment, modern text-to-image (T2I) generative models still struggle to synthesize diverse images from a given prompt.<n>We introduce Geometry-Aware Spherical Sampling (GASS) to enhance diversity by explicitly controlling both prompt-dependent and prompt-independent sources of variation.<n>Our experiments on different frozen T2I backbones and benchmarks demonstrate the effectiveness of disentangled diversity enhancement with minimal impact on image fidelity and semantic alignment.
arXiv Detail & Related papers (2026-02-19T09:41:32Z) - GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction [15.701540201818192]
Multi-view image generation holds significant application value in computer vision.<n>Existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency.<n>We propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information.
arXiv Detail & Related papers (2025-11-15T13:17:18Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z) - SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model [63.685132323224124]
Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.
In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges.
Experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.
arXiv Detail & Related papers (2024-03-15T06:26:46Z) - GeoPos: A Minimal Positional Encoding for Enhanced Fine-Grained Details in Image Synthesis Using Convolutional Neural Networks [0.0]
The enduring inability of image generative models to recreate intricate geometric features has been an ongoing problem for nearly a decade.<n>In this paper, we demonstrate how this problem can be mitigated by augmenting convolution layers geometric capabilities.<n>We show this drastically improves quality of images generated by Diffusion Models, GANs, and Variational AutoEncoders (VAE)
arXiv Detail & Related papers (2024-01-03T19:27:20Z) - Few-shot Image Generation via Information Transfer from the Built
Geodesic Surface [2.617962830559083]
We propose a method called Information Transfer from the Built Geodesic Surface (ITBGS)
With the FAGS module, a pseudo-source domain is created by projecting image features from the training dataset into the Pre-Shape Space.
We demonstrate that the proposed method consistently achieves optimal or comparable results across a diverse range of semantically distinct datasets.
arXiv Detail & Related papers (2024-01-03T13:57:09Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - The Geometry of Deep Generative Image Models and its Applications [0.0]
Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets.
These networks are trained to map random inputs in their latent space to new samples representative of the learned data.
The structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator.
arXiv Detail & Related papers (2021-01-15T07:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.