Related papers: Guiding a diffusion model using sliding windows

Related papers

Steering Guidance for Personalized Text-to-Image Diffusion Models [19.550718192994353]
Existing sampling guidance methods fail to guide the output toward well-balanced space.<n>We propose personalization guidance, a simple yet effective method leveraging an unlearned weak model conditioned on a null text prompt.<n>Our method explicitly steers the outputs toward a balanced latent space without additional computational overhead.
arXiv Detail & Related papers (2025-08-01T05:02:26Z)
Learning Diffusion Models with Flexible Representation Guidance [49.26046407886349]
We present a systematic framework for incorporating representation guidance into diffusion models.<n>We introduce two new strategies for enhancing representation alignment in diffusion models.<n>Experiments across image, protein sequence, and molecule generation tasks demonstrate superior performance as well as accelerated training.
arXiv Detail & Related papers (2025-07-11T19:29:02Z)
How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models [57.42800112251644]
We propose Step AG, which is a simple, universally applicable adaptive guidance strategy.<n>Our evaluations focus on both image quality and image-text alignment.
arXiv Detail & Related papers (2025-06-10T02:09:48Z)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z)
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model [62.11981915549919]
Domain Guidance is a transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain. We demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD$_textDINOv2$ compared to standard fine-tuning.
arXiv Detail & Related papers (2025-04-02T09:07:55Z)
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers [4.015569252776372]
ArchonView is a method that significantly exceeds state-of-the-art methods despite being trained from scratch with 3D rendering data only and no 2D pretraining.<n>Our model also exhibits robust performance even for difficult camera poses where previous methods fail, and is several times faster in inference speed compared to diffusion.
arXiv Detail & Related papers (2025-03-17T17:59:59Z)
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity [9.092404060771306]
Diffusion models have shown impressive results in generating high-quality conditional samples. However, existing methods often require additional training or neural function evaluations (NFEs) We propose a novel and efficient method, termed PLADIS, which boosts pre-trained models by leveraging sparse attention.
arXiv Detail & Related papers (2025-03-10T07:23:19Z)
TESS 2: A Large-Scale Generalist Diffusion Language Model [24.91689676432666]
TESS 2 is an instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models.<n>We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models.<n>We propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model.
arXiv Detail & Related papers (2025-02-19T17:50:31Z)
Diffusion Models without Classifier-free Guidance [41.59396565229466]
Model-guidance (MG) is a novel objective for training diffusion model addresses and removes commonly used guidance (CFG)<n>Our innovative approach transcends the standard modeling and incorporates the posterior probability of conditions.<n>Our method significantly accelerates the training process, doubles inference speed, and achieve exceptional quality that parallel surpass even concurrent diffusion models with CFG.
arXiv Detail & Related papers (2025-02-17T18:59:50Z)
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance [12.973835034100428]
This paper presents SNOOPI, a novel framework designed to enhance the guidance in one-step diffusion models during both training and inference. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images.
arXiv Detail & Related papers (2024-12-03T18:56:32Z)
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training [8.352666876052616]
We propose Diff-Instruct* (DI*), a data-efficient post-training approach for one-step text-to-image generative models.<n>Our method frames alignment as online reinforcement learning from human feedback.<n>Our 2.6B emphDI*-SDXL-1step model outperforms the 50-step 12B FLUX-dev model.
arXiv Detail & Related papers (2024-10-28T10:26:19Z)
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery [54.866490321241905]
Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models. In this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias" This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model.
arXiv Detail & Related papers (2024-10-18T11:49:40Z)
Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models. An external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Foundational GPT Model for MEG [3.524869467682149]
We propose two classes of deep learning foundational models that can be trained using forecasting of unlabelled brain signals. First, we consider a modified Wavenet; and second, we consider a modified Transformer-based (GPT2) model. We compare the performance of these deep learning models with standard linear autoregressive (AR) modelling on MEG data.
arXiv Detail & Related papers (2024-04-14T13:48:24Z)
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [49.80911683739506]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.<n>We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.<n>Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align) Our method aims to bolster the model's generalizability by preserving the consistency of spurious features. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z)
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z)
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models. We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)
Structural Pruning for Diffusion Models [65.02607075556742]
We present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method.
arXiv Detail & Related papers (2023-05-18T12:38:21Z)
Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss. Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)
Self-Feature Regularization: Self-Feature Distillation Without Teacher Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers. We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.