Related papers: Self-Attention Decomposition For Training Free Diffusion Editing

Self-Attention Decomposition For Training Free Diffusion Editing

URL: http://arxiv.org/abs/2510.22650v1
Date: Sun, 26 Oct 2025 12:22:56 GMT
Title: Self-Attention Decomposition For Training Free Diffusion Editing
Authors: Tharun Anand, Mohammad Hassan Vali, Arno Solin,
Abstract summary: A key step toward controllability is to identify interpretable directions in the model's latent representations.<n>We propose an analytical method that derives semantic editing directions directly from the pretrained parameters of diffusion models.
Score: 18.8152476816527
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models achieve remarkable fidelity in image synthesis, yet precise control over their outputs for targeted editing remains challenging. A key step toward controllability is to identify interpretable directions in the model's latent representations that correspond to semantic attributes. Existing approaches for finding interpretable directions typically rely on sampling large sets of images or training auxiliary networks, which limits efficiency. We propose an analytical method that derives semantic editing directions directly from the pretrained parameters of diffusion models, requiring neither additional data nor fine-tuning. Our insight is that self-attention weight matrices encode rich structural information about the data distribution learned during training. By computing the eigenvectors of these weight matrices, we obtain robust and interpretable editing directions. Experiments demonstrate that our method produces high-quality edits across multiple datasets while reducing editing time significantly by 60% over current benchmarks.

Related papers

PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models [35.59605874012795]
PropFly is a training pipeline for propagation-based video editing.<n>PropFly relies on pre-trained video diffusion models (VDMs) instead of requiring off-the-shelf or precomputed paired video editing datasets.<n>Our pipeline enables an adapter attached to the pre-trained VDM to learn to propagate edits via Guidance-Modulated Flow Matching (GMFM) loss.
arXiv Detail & Related papers (2026-02-24T06:11:08Z)
Disentangled representations via score-based variational autoencoders [21.955536401578616]
We present the Score-based Autoencoder for Multiscale Inference (SAMI)<n>SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process.<n>It can extract useful representations from pre-trained diffusion models with minimal additional training.
arXiv Detail & Related papers (2025-12-18T23:42:10Z)
Learning an Image Editing Model without Image Editing Pairs [83.03646586929638]
Recent image editing models have achieved impressive results while following natural language editing instructions.<n>They rely on supervised fine-tuning with large datasets of input-target pairs.<n>Current workarounds use synthetic training pairs that leverage the zero-shot capabilities of existing models.<n>We present a new training paradigm that eliminates the need for paired data entirely.
arXiv Detail & Related papers (2025-10-16T17:59:57Z)
Cross-Subject Mind Decoding from Inaccurate Representations [42.19569985029642]
We propose a Bi Autoencoder Intertwining framework for accurate decoded representation prediction.<n>Our method outperforms state-of-the-art approaches on benchmark datasets in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-07-25T08:45:02Z)
Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets [15.786823017952122]
ControlNet enables precise alignment between ground truth segmentation masks and the generated image content.<n>We propose the first approach to integrate active learning-based selection metrics into the backward diffusion process.<n>We show that segmentation models trained with guided synthetic data outperform those trained on non-guided synthetic data.
arXiv Detail & Related papers (2025-03-12T10:09:27Z)
InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models [28.51460282167433]
diffusion models are highly data-driven and prone to inheriting imbalances and biases present in real-world data.<n>We propose a framework, InvDiff, which aims to learn invariant semantic information for diffusion guidance.<n>InvDiff effectively reduces biases while maintaining the quality of image generation.
arXiv Detail & Related papers (2024-12-11T15:47:11Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Informed Correctors for Discrete Diffusion Models [27.295990499157814]
We propose a predictor-corrector sampling scheme for discrete diffusion models.<n>We show that our informed corrector consistently produces superior samples with fewer errors or improved FID scores.<n>Our results underscore the potential of informed correctors for fast and high-fidelity generation using discrete diffusion.
arXiv Detail & Related papers (2024-07-30T23:29:29Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse. We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z)
Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.