CoRe^2: Collect, Reflect and Refine to Generate Better and Faster
- URL: http://arxiv.org/abs/2503.09662v1
- Date: Wed, 12 Mar 2025 15:15:25 GMT
- Title: CoRe^2: Collect, Reflect and Refine to Generate Better and Faster
- Authors: Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Zeke Xie,
- Abstract summary: We introduce a novel plug-and-play inference paradigm, CoRe2, which comprises three subprocesses: Collect, Reflect, and Refine.<n>CoRe2 employs weak-to-strong guidance to refine the conditional output, thereby improving the model's capacity to generate high-frequency and realistic content.<n>It has exhibited significant performance improvements on HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench.
- Score: 11.230943283470522
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Making text-to-image (T2I) generative model sample both fast and well represents a promising research direction. Previous studies have typically focused on either enhancing the visual quality of synthesized images at the expense of sampling efficiency or dramatically accelerating sampling without improving the base model's generative capacity. Moreover, nearly all inference methods have not been able to ensure stable performance simultaneously on both diffusion models (DMs) and visual autoregressive models (ARMs). In this paper, we introduce a novel plug-and-play inference paradigm, CoRe^2, which comprises three subprocesses: Collect, Reflect, and Refine. CoRe^2 first collects classifier-free guidance (CFG) trajectories, and then use collected data to train a weak model that reflects the easy-to-learn contents while reducing number of function evaluations during inference by half. Subsequently, CoRe^2 employs weak-to-strong guidance to refine the conditional output, thereby improving the model's capacity to generate high-frequency and realistic content, which is difficult for the base model to capture. To the best of our knowledge, CoRe^2 is the first to demonstrate both efficiency and effectiveness across a wide range of DMs, including SDXL, SD3.5, and FLUX, as well as ARMs like LlamaGen. It has exhibited significant performance improvements on HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench. Furthermore, CoRe^2 can be seamlessly integrated with the state-of-the-art Z-Sampling, outperforming it by 0.3 and 0.16 on PickScore and AES, while achieving 5.64s time saving using SD3.5.Code is released at https://github.com/xie-lab-ml/CoRe/tree/main.
Related papers
- Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models [14.618774364317053]
We present Morse, a dual-sampling framework for accelerating diffusion models losslessly.<n>Specifically, Morse involves two models called Dash and Dot that interact with each other.<n>By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance.
arXiv Detail & Related papers (2025-06-23T02:43:21Z) - AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? [49.64902130083662]
This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random variations, rotation, zooming and intensity-based transformations.<n>The proposed augmentation strategy is evaluated on three models: multi-stream e2eET, FPPR point cloud-based hand gesture recognition (HGR), and DD-Network.
arXiv Detail & Related papers (2025-06-08T16:43:05Z) - Flow-GRPO: Training Flow Matching Models via Online RL [75.70017261794422]
We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models.<n>Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Equation (ODE) into an equivalent Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original inference timestep number.
arXiv Detail & Related papers (2025-05-08T17:58:45Z) - Context-guided Responsible Data Augmentation with Diffusion Models [29.41191005466334]
We propose a text-to-image (T2I) data augmentation method, named DiffCoRe-Mix, that computes a set of generative counterparts for a training sample.
To preserve key semantic axes, we also filter out undesired generative samples in our augmentation process.
We extensively evaluate our technique on ImageNet-1K,Tiny ImageNet-200, CIFAR-100, Flowers102, CUB-Birds, Stanford Cars, and Caltech datasets.
arXiv Detail & Related papers (2025-03-12T00:12:27Z) - Weak-to-Strong Diffusion with Reflection [56.39451539396458]
We propose Weak-to-Strong Diffusion (W2SD) to bridge the gap between an ideal model and a strong model.
W2SD steers latent variables along sampling trajectories toward regions of the real data distribution.
Extensive experiments demonstrate that W2SD significantly improves human preference, aesthetic quality, and prompt adherence.
arXiv Detail & Related papers (2025-02-01T16:00:08Z) - SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction.<n>Recent works have diverged into two directions: regression-based modeling and generative modeling.<n>We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z) - Efficient Generative Modeling with Residual Vector Quantization-Based Tokens [5.949779668853557]
ResGen is an efficient RVQ-based discrete diffusion model that generates high-fidelity samples without compromising sampling speed.<n>We validate the efficacy and generalizability of the proposed method on two challenging tasks: conditional image generation on ImageNet 256x256 and zero-shot text-to-speech synthesis.<n>As we scale the depth of RVQ, our generative models exhibit enhanced generation fidelity or faster sampling speeds compared to similarly sized baseline models.
arXiv Detail & Related papers (2024-12-13T15:31:17Z) - Adversarial Diffusion Compression for Real-World Image Super-Resolution [16.496532580598007]
Real-world image super-resolution aims to reconstruct high-resolution images from degraded low-resolution inputs.<n>One-step diffusion networks like OSEDiff and S3Diff alleviate this issue but still incur high computational costs.<n>This paper proposes a novel Real-ISR method, AdcSR, by distilling the one-step diffusion network OSEDiff into a streamlined diffusion-GAN model.
arXiv Detail & Related papers (2024-11-20T15:13:36Z) - Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding [8.046705062670096]
Regressive lightweight speculative decoding has garnered attention for its notable efficiency improvements in text generation tasks.
Clover-2 is an RNN-based draft model designed to achieve comparable accuracy to that of attention decoder layer models.
arXiv Detail & Related papers (2024-08-01T03:43:32Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z) - ReDi: Efficient Learning-Free Diffusion Inference via Trajectory
Retrieval [68.7008281316644]
ReDi is a learning-free Retrieval-based Diffusion sampling framework.
We show that ReDi improves the model inference efficiency by 2x speedup.
arXiv Detail & Related papers (2023-02-05T03:01:28Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.