Related papers: Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

URL: http://arxiv.org/abs/2308.06739v1
Date: Sun, 13 Aug 2023 10:07:46 GMT
Title: Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks
Authors: David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou
Abstract summary: Text-to-image diffusion models have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. We introduce customized solutions by fully exploiting the aforementioned free attention masks.
Score: 64.67735676127208
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. To address this, we start by uncovering that diffusion models' cross-attention layers inherently provide annotation-free attention masks aligned with corresponding text inputs on generated images. We then investigate the problems of three prevalent unsupervised learning techniques ( i.e., contrastive learning, masked modeling, and vision-language pretraining) and introduce customized solutions by fully exploiting the aforementioned free attention masks. Our approach is validated through extensive experiments that show consistent improvements in baseline models across various downstream tasks, including image classification, detection, segmentation, and image-text retrieval. By utilizing our method, it is possible to close the performance gap between unsupervised pretraining on synthetic data and real-world scenarios.

Related papers

MIMRS: A Survey on Masked Image Modeling in Remote Sensing [12.28883063656968]
Masked Image Modeling (MIM) is a self-supervised learning technique that involves masking portions of an image. MIM addresses challenges such as incomplete data caused by cloud cover, occlusions, and sensor limitations. This survey (MIMRS) is a pioneering effort to chart the landscape of mask image modeling in remote sensing.
arXiv Detail & Related papers (2025-04-04T05:16:51Z)
Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection [54.26588902144298]
We propose a knowledge-guided prompt learning method for deepfake facial image detection. Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts. Our proposed approach notably outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-01T02:18:18Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models. In this paper, we investigate how detection performance varies across model backbones, types, and datasets. We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models [23.09033991200197]
New personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution poses a new concern regarding whether the personalized models are trained from unauthorized data. We introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models.
arXiv Detail & Related papers (2024-10-14T12:29:23Z)
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation [60.943159830780154]
We introduce Bounded Attention, a training-free method for bounding the information flow in the sampling process. We demonstrate that our method empowers the generation of multiple subjects that better align with given prompts and layouts.
arXiv Detail & Related papers (2024-03-25T17:52:07Z)
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention [62.671435607043875]
Research indicates that text-to-image diffusion models replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks. We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens. We introduce an innovative approach to detect and mitigate memorization in diffusion models.
arXiv Detail & Related papers (2024-03-17T01:27:00Z)
Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks. We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z)
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training [13.087987450384036]
We present a new Masked Image Modeling (MIM), termed Geminated Autoencoder (Ge$2$-AE) for visual pre-training. Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space.
arXiv Detail & Related papers (2022-04-18T09:22:55Z)
Intelligent Masking: Deep Q-Learning for Context Encoding in Medical Image Analysis [48.02011627390706]
We develop a novel self-supervised approach that occludes targeted regions to improve the pre-training procedure. We show that training the agent against the prediction model can significantly improve the semantic features extracted for downstream classification tasks.
arXiv Detail & Related papers (2022-03-25T19:05:06Z)
Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI) PPI leverages proactive interventions to guard against image features with no causal relevance. We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.