From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
- URL: http://arxiv.org/abs/2504.16080v1
- Date: Tue, 22 Apr 2025 17:58:07 GMT
- Title: From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
- Authors: Le Zhuo, Liangbing Zhao, Sayak Paul, Yue Liao, Renrui Zhang, Yi Xin, Peng Gao, Mohamed Elhoseiny, Hongsheng Li,
- Abstract summary: ReflectionFlow is an inference-time framework enabling text-to-image diffusion models to iteratively reflect upon and refine their outputs.<n>To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image.
- Score: 64.7863715647187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often struggle with complex scenes and fine-grained details. Inspired by the self-reflection capabilities emergent in large language models, we propose ReflectionFlow, an inference-time framework enabling diffusion models to iteratively reflect upon and refine their outputs. ReflectionFlow introduces three complementary inference-time scaling axes: (1) noise-level scaling to optimize latent initialization; (2) prompt-level scaling for precise semantic guidance; and most notably, (3) reflection-level scaling, which explicitly provides actionable reflections to iteratively assess and correct previous generations. To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image. Leveraging this dataset, we efficiently perform reflection tuning on state-of-the-art diffusion transformer, FLUX.1-dev, by jointly modeling multimodal inputs within a unified framework. Experimental results show that ReflectionFlow significantly outperforms naive noise-level scaling methods, offering a scalable and compute-efficient solution toward higher-quality image synthesis on challenging tasks.
Related papers
- Dereflection Any Image with Diffusion Priors and Diversified Data [86.15504914121226]
We propose a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal.<n>First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes.<n>Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference.
arXiv Detail & Related papers (2025-03-21T17:48:14Z) - Utilizing Multi-step Loss for Single Image Reflection Removal [0.9208007322096532]
Distorted images can negatively impact tasks like object detection and image segmentation.<n>We present a novel approach for image reflection removal using a single image.
arXiv Detail & Related papers (2024-12-11T17:57:25Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - FIRM: Flexible Interactive Reflection reMoval [75.38207315080624]
This paper presents FIRM, a novel framework for Flexible Interactive image Reflection reMoval.<n>The proposed framework requires only 10% of the guidance time needed by previous interactive methods.<n>Results on public real-world reflection removal datasets validate that our method demonstrates state-of-the-art reflection removal performance.
arXiv Detail & Related papers (2024-06-03T17:34:37Z) - Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement [2.9873893715462185]
We propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED.<n>It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains.<n>It successfully alleviates the dependence on pairwise training data via zero-reference learning.
arXiv Detail & Related papers (2024-03-05T11:39:17Z) - Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and
Cycle Idempotence [76.93002743194974]
We propose a method to treat arbitrary rescaling, both upscaling and downscaling, as one unified process.
The proposed model is able to learn upscaling and downscaling simultaneously and achieve bidirectional arbitrary image rescaling.
It is shown to be robust in cycle idempotence test, free of severe degradations in reconstruction accuracy when the downscaling-to-upscaling cycle is applied repetitively.
arXiv Detail & Related papers (2022-03-02T07:42:15Z) - ReflectNet -- A Generative Adversarial Method for Single Image
Reflection Suppression [0.6980076213134382]
We propose a single image reflection removal method based on context understanding modules and adversarial training.
Our proposed reflection removal method outperforms state-of-the-art methods in terms of PSNR and SSIM on the SIR benchmark dataset.
arXiv Detail & Related papers (2021-05-11T17:33:40Z) - Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance [78.34235841168031]
We present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR)
RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis.
Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods.
arXiv Detail & Related papers (2020-12-02T03:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.