Related papers: Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

URL: http://arxiv.org/abs/2411.17769v2
Date: Mon, 21 Jul 2025 10:35:17 GMT
Title: Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Authors: Xinyu Hou, Zongsheng Yue, Xiaoming Li, Chen Change Loy,
Abstract summary: We show that we only need a single parameter $omega$ to effectively control granularity in diffusion-based synthesis.<n>This simple approach does not require model retraining or architectural modifications and incurs negligible computational overhead.<n>The method demonstrates impressive performance across various image and video synthesis tasks and is adaptable to advanced diffusion models.
Score: 55.00448838152145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we show that we only need a single parameter $\omega$ to effectively control granularity in diffusion-based synthesis. This parameter is incorporated during the denoising steps of the diffusion model's reverse process. This simple approach does not require model retraining or architectural modifications and incurs negligible computational overhead, yet enables precise control over the level of details in the generated outputs. Moreover, spatial masks or denoising schedules with varying $\omega$ values can be applied to achieve region-specific or timestep-specific granularity control. External control signals or reference images can guide the creation of precise $\omega$ masks, allowing targeted granularity adjustments. Despite its simplicity, the method demonstrates impressive performance across various image and video synthesis tasks and is adaptable to advanced diffusion models. The code is available at https://github.com/itsmag11/Omegance.

Related papers

Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability [34.888135351211616]
MaskUNet is a simple yet effective method-termed MaskUNet''- that enhances generation quality with negligible parameter numbers.<n>We offer two fine-tuning strategies: a training-based approach and a training-free approach, including tailored networks and optimization functions.<n>In zero-shot inference on a COCO dataset, MaskUNet achieves the best FID score and further demonstrates its effectiveness in downstream task evaluations.
arXiv Detail & Related papers (2025-05-06T01:14:20Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Enabling Versatile Controls for Video Diffusion Models [18.131652071161266]
VCtrl is a novel framework designed to enable fine control over pre-trained video diffusion models. Comprehensive experiments and human evaluations demonstrate VCtrl effectively enhances controllability and generation quality.
arXiv Detail & Related papers (2025-03-21T09:48:00Z)
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise [19.422355461775343]
We enhance video diffusion models by allowing motion control via structured latent noise sampling.<n>We propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise.<n>The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead.
arXiv Detail & Related papers (2025-01-14T18:59:10Z)
Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation [70.95380821618711]
Dichotomous Image (DIS) tasks require highly precise annotations. Current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability. We introduce a novel approach, which provides a scalable solution for generating diverse and precise datasets.
arXiv Detail & Related papers (2024-12-26T06:37:25Z)
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis [38.16884934336603]
We propose MaskControl, the first approach to introduce controllability to the generative masked motion model.<n>First, textitLogits Regularizer implicitly perturbs logits at training time to align the distribution of motion tokens with the controlled joint positions.<n>Second, textitLogit optimization explicitly reshapes the token distribution that forces the generated motion to accurately align with the controlled joint positions.
arXiv Detail & Related papers (2024-10-14T17:50:27Z)
Unified Auto-Encoding with Masked Diffusion [15.264296748357157]
We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD) UMD combines patch-based and noise-based corruption techniques within a single auto-encoding framework. It achieves strong performance in downstream generative and representation learning tasks.
arXiv Detail & Related papers (2024-06-25T16:24:34Z)
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing [18.508719350413802]
We propose an efficient generative tuning framework, dubbed SCEdit, which integrates and edits Skip Connection. SCEdit substantially reduces training parameters, memory usage, and computational expense. Experiments conducted on text-to-image generation and controllable image synthesis tasks demonstrate the superiority of our method in terms of efficiency and performance.
arXiv Detail & Related papers (2023-12-18T17:54:14Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Readout Guidance: Learning Control from Diffusion Features [96.22155562120231]
We present Readout Guidance, a method for controlling text-to-image diffusion models with learned signals. Readout Guidance uses readout heads, lightweight networks trained to extract signals from the features of a pre-trained, frozen diffusion model at every timestep. These readouts can encode single-image properties, such as pose, depth, and edges; or higher-order properties that relate multiple images, such as correspondence and appearance similarity.
arXiv Detail & Related papers (2023-12-04T18:59:32Z)
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation [79.8881514424969]
Text-conditional diffusion models are able to generate high-fidelity images with diverse contents. However, linguistic representations frequently exhibit ambiguous descriptions of the envisioned objective imagery. We propose Cocktail, a pipeline to mix various modalities into one embedding.
arXiv Detail & Related papers (2023-06-01T17:55:32Z)
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis [59.10787643285506]
Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks. One critical limitation of these models is the low fidelity of generated images with respect to the text description. We propose a new text-to-image algorithm that adds explicit control over spatial-temporal cross-attention in diffusion models.
arXiv Detail & Related papers (2023-04-07T23:49:34Z)
Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods [27.014858633903867]
We present a training framework for feature disentanglement of Diffusion Models (FDiff) We propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability.
arXiv Detail & Related papers (2023-02-28T07:43:00Z)
Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z)
Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos [48.57686258913474]
Video action recognition has been partially addressed by the CNNs stacking of fixed-size 3D kernels. We propose to learn the optimal-scale kernels from the data. An textitaction perceptron synthesizer is proposed to generate the kernels from a bag of fixed-size kernels.
arXiv Detail & Related papers (2020-07-22T14:22:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.