Related papers: Weak-to-Strong Diffusion with Reflection

Weak-to-Strong Diffusion with Reflection

URL: http://arxiv.org/abs/2502.00473v3
Date: Thu, 24 Apr 2025 16:09:26 GMT
Title: Weak-to-Strong Diffusion with Reflection
Authors: Lichen Bai, Masashi Sugiyama, Zeke Xie,
Abstract summary: We propose Weak-to-Strong Diffusion (W2SD) to bridge the gap between an ideal model and a strong model.<n>W2SD steers latent variables along sampling trajectories toward regions of the real data distribution.<n>Extensive experiments demonstrate that W2SD significantly improves human preference, aesthetic quality, and prompt adherence.
Score: 56.39451539396458
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of diffusion generative models is to align the learned distribution with the real data distribution through gradient score matching. However, inherent limitations in training data quality, modeling strategies, and architectural design lead to inevitable gap between generated outputs and real data. To reduce this gap, we propose Weak-to-Strong Diffusion (W2SD), a novel framework that utilizes the estimated difference between existing weak and strong models (i.e., weak-to-strong difference) to bridge the gap between an ideal model and a strong model. By employing a reflective operation that alternates between denoising and inversion with weak-to-strong difference, we theoretically understand that W2SD steers latent variables along sampling trajectories toward regions of the real data distribution. W2SD is highly flexible and broadly applicable, enabling diverse improvements through the strategic selection of weak-to-strong model pairs (e.g., DreamShaper vs. SD1.5, good experts vs. bad experts in MoE). Extensive experiments demonstrate that W2SD significantly improves human preference, aesthetic quality, and prompt adherence, achieving SOTA performance across various modalities (e.g., image, video), architectures (e.g., UNet-based, DiT-based, MoE), and benchmarks. For example, Juggernaut-XL with W2SD can improve with the HPSv2 winning rate up to 90% over the original results. Moreover, the performance gains achieved by W2SD markedly outweigh its additional computational overhead, while the cumulative improvements from different weak-to-strong difference further solidify its practical utility and deployability.

Related papers

Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization [23.328511708942045]
Heterogeneity-aware Distributional Framework (HDF) designed to enhance time-frequency modeling and mitigate imbalance caused by hard samples.<n>Time-Frequency Distributional Attention Module (DAM) captures both temporal consistency and frequency robustness.<n> adaptive optimization module Distribution-aware Scaling Module (DSM) introduced to dynamically balance classification and contrastive losses.
arXiv Detail & Related papers (2025-07-21T16:21:47Z)
Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
D2R: dual regularization loss with collaborative adversarial generation for model robustness [23.712462151414726]
robustness of Deep Neural Network models is crucial for defending models against adversarial attacks.<n>We propose a dual regularization loss (D2R Loss) method and a collaborative adversarial generation (CAG) strategy for adversarial training.<n>Our results show that D2R loss with CAG produces highly robust models.
arXiv Detail & Related papers (2025-06-08T09:39:54Z)
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster [11.230943283470522]
We introduce a novel plug-and-play inference paradigm, CoRe2, which comprises three subprocesses: Collect, Reflect, and Refine. CoRe2 employs weak-to-strong guidance to refine the conditional output, thereby improving the model's capacity to generate high-frequency and realistic content. It has exhibited significant performance improvements on HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench.
arXiv Detail & Related papers (2025-03-12T15:15:25Z)
Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution [66.11004226578771]
Existing robust benchmark datasets have two key limitations. They generate only a limited range of perturbations for a single Information Extraction (IE) task. Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench. We show that training with only textbf15% of the data leads to an average textbf7.5% relative performance improvement across three IE tasks.
arXiv Detail & Related papers (2025-03-05T05:39:29Z)
Improved Training Technique for Latent Consistency Models [18.617862678160243]
Consistency models are capable of producing high-quality samples in either a single step or multiple steps.<n>We analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers.<n>We introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance.
arXiv Detail & Related papers (2025-02-03T15:25:58Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training. We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
Towards Robust Federated Learning via Logits Calibration on Non-IID Data [49.286558007937856]
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. Recent studies have shown that FL is vulnerable to adversarial examples, leading to a significant drop in its performance. In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks.
arXiv Detail & Related papers (2024-03-05T09:18:29Z)
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces. We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z)
Distributionally Robust Cross Subject EEG Decoding [15.211091130230589]
We propose a principled approach to perform dynamic evolution on the data for improvement of decoding robustness. We derived a general data evolution framework based on Wasserstein gradient flow (WGF) and provides two different forms of evolution within the framework. The proposed approach can be readily integrated with other data augmentation approaches for further improvements.
arXiv Detail & Related papers (2023-08-19T11:31:33Z)
Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss. Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.