Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset
- URL: http://arxiv.org/abs/2505.02255v2
- Date: Thu, 08 May 2025 19:05:22 GMT
- Title: Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset
- Authors: Jakub Wasala, Bartlomiej Wrzalski, Kornelia Noculak, Yuliia Tarasenko, Oliwer Krupa, Jan Kocon, Grzegorz Chodak,
- Abstract summary: This study presents a novel approach to enhance the cost-to-quality ratio of image generation with diffusion models.<n>We generate a synthetic paired dataset and train a fast image-to-image translation head.<n>Our results show that the pipeline, which combines a distilled version of a large generative model with our enhancement layer, delivers similar photorealistic portraits to the baseline version.
- Score: 2.6177855932435623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study presents a novel approach to enhance the cost-to-quality ratio of image generation with diffusion models. We hypothesize that differences between distilled (e.g. FLUX.1-schnell) and baseline (e.g. FLUX.1-dev) models are consistent and, therefore, learnable within a specialized domain, like portrait generation. We generate a synthetic paired dataset and train a fast image-to-image translation head. Using two sets of low- and high-quality synthetic images, our model is trained to refine the output of a distilled generator (e.g., FLUX.1-schnell) to a level comparable to a baseline model like FLUX.1-dev, which is more computationally intensive. Our results show that the pipeline, which combines a distilled version of a large generative model with our enhancement layer, delivers similar photorealistic portraits to the baseline version with up to an 82% decrease in computational cost compared to FLUX.1-dev. This study demonstrates the potential for improving the efficiency of AI solutions involving large-scale image generation.
Related papers
- Ambient Diffusion Omni: Training Good Models with Bad Data [45.821861121026394]
We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model.<n>We present Ambient Omni, a principled framework to train diffusion models that can extract signal from all available images.
arXiv Detail & Related papers (2025-06-10T22:37:39Z) - One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.<n>First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.<n>Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z) - OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs [20.652907645817713]
OFTSR is a flow-based framework for one-step image super-resolution that can produce outputs with tunable levels of fidelity and realism.<n>We demonstrate that OFTSR achieves state-of-the-art performance for one-step image super-resolution, while having the ability to flexibly tune the fidelity-realism trade-off.
arXiv Detail & Related papers (2024-12-12T17:14:58Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [1.5892730797514436]
Text-to-image diffusion models often suffer from slow iterative sampling processes.
We present a novel image-free distillation scheme named $textbfSwiftBrush$.
SwiftBrush achieves an FID score of $textbf16.67$ and a CLIP score of $textbf0.29$ on the COCO-30K benchmark.
arXiv Detail & Related papers (2023-12-08T18:44:09Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.