Can Generative Models Improve Self-Supervised Representation Learning?
- URL: http://arxiv.org/abs/2403.05966v3
- Date: Tue, 17 Dec 2024 15:31:49 GMT
- Title: Can Generative Models Improve Self-Supervised Representation Learning?
- Authors: Sana Ayromlou, Vahid Reza Khazaie, Fereshteh Forghani, Arash Afkanpour,
- Abstract summary: We introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations.
Our results show that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks.
- Score: 0.7999703756441756
- License:
- Abstract: The rapid advancement in self-supervised representation learning has highlighted its potential to leverage unlabeled data for learning rich visual representations. However, the existing techniques, particularly those employing different augmentations of the same image, often rely on a limited set of simple transformations that cannot fully capture variations in the real world. This constrains the diversity and quality of samples, which leads to sub-optimal representations. In this paper, we introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations. By directly conditioning generative models on a source image, our method enables the generation of diverse augmentations while maintaining the semantics of the source image, thus offering a richer set of data for SSL. Our extensive experimental results on various joint-embedding SSL techniques demonstrate that our framework significantly enhances the quality of learned visual representations by up to 10\% Top-1 accuracy in downstream tasks. This research demonstrates that incorporating generative models into the joint-embedding SSL workflow opens new avenues for exploring the potential of synthetic data. This development paves the way for more robust and versatile representation learning techniques.
Related papers
- Direct Ascent Synthesis: Revealing Hidden Generative Capabilities in Discriminative Models [6.501811946908292]
We show that discriminative models inherently contain powerful generative capabilities.
Our method, Direct Ascent Synthesis, reveals these latent capabilities.
DAS achieves high-quality image synthesis by decomposing optimization across multiple spatial scales.
arXiv Detail & Related papers (2025-02-11T18:27:27Z) - Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning [52.170253590364545]
Gen-SIS is a diffusion-based augmentation technique trained exclusively on unlabeled image data.
We show that these self-augmentations', i.e. generative augmentations based on the vanilla SSL encoder embeddings, facilitate the training of a stronger SSL encoder.
arXiv Detail & Related papers (2024-12-02T16:20:59Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images [35.195284384050325]
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images.
We address class name ambiguity, lack of diversity in naive prompts, and domain shifts.
Our framework consistently enhances recognition model performance with more synthetic data.
arXiv Detail & Related papers (2023-12-04T18:35:27Z) - CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts [11.752632557524969]
We propose contrastive learning with data augmentation to disentangle content features from the original representations.
Our experiments across diverse datasets demonstrate significant improvements in zero-shot and few-shot classification tasks.
arXiv Detail & Related papers (2023-11-28T03:00:59Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.