Related papers: Can Generative Models Improve Self-Supervised Representation Learning?

Can Generative Models Improve Self-Supervised Representation Learning?

URL: http://arxiv.org/abs/2403.05966v3
Date: Tue, 17 Dec 2024 15:31:49 GMT
Title: Can Generative Models Improve Self-Supervised Representation Learning?
Authors: Sana Ayromlou, Vahid Reza Khazaie, Fereshteh Forghani, Arash Afkanpour,
Abstract summary: We introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations.<n>Our results show that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks.
Score: 0.7999703756441756
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement in self-supervised representation learning has highlighted its potential to leverage unlabeled data for learning rich visual representations. However, the existing techniques, particularly those employing different augmentations of the same image, often rely on a limited set of simple transformations that cannot fully capture variations in the real world. This constrains the diversity and quality of samples, which leads to sub-optimal representations. In this paper, we introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations. By directly conditioning generative models on a source image, our method enables the generation of diverse augmentations while maintaining the semantics of the source image, thus offering a richer set of data for SSL. Our extensive experimental results on various joint-embedding SSL techniques demonstrate that our framework significantly enhances the quality of learned visual representations by up to 10\% Top-1 accuracy in downstream tasks. This research demonstrates that incorporating generative models into the joint-embedding SSL workflow opens new avenues for exploring the potential of synthetic data. This development paves the way for more robust and versatile representation learning techniques.

Related papers

Direct Ascent Synthesis: Revealing Hidden Generative Capabilities in Discriminative Models [6.501811946908292]
We show that discriminative models inherently contain powerful generative capabilities. Our method, Direct Ascent Synthesis, reveals these latent capabilities. DAS achieves high-quality image synthesis by decomposing optimization across multiple spatial scales.
arXiv Detail & Related papers (2025-02-11T18:27:27Z)
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning [52.170253590364545]
Gen-SIS is a diffusion-based augmentation technique trained exclusively on unlabeled image data. We show that these self-augmentations', i.e. generative augmentations based on the vanilla SSL encoder embeddings, facilitate the training of a stronger SSL encoder.
arXiv Detail & Related papers (2024-12-02T16:20:59Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images [37.29348016920314]
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images. We address class name ambiguity, lack of diversity in naive prompts, and domain shifts. Our framework consistently enhances recognition model performance with more synthetic data.
arXiv Detail & Related papers (2023-12-04T18:35:27Z)
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts [11.752632557524969]
We propose contrastive learning with data augmentation to disentangle content features from the original representations. Our experiments across diverse datasets demonstrate significant improvements in zero-shot and few-shot classification tasks.
arXiv Detail & Related papers (2023-11-28T03:00:59Z)
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models. Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z)
Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks. Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z)
Virtual embeddings and self-consistency for self-supervised learning [43.086696088061416]
TriMix is a novel concept for self-supervised learning that generates virtual embeddings through linear data. We validate TriMix on eight benchmark datasets with an improvement of 2.71% and 0.41% better than the second-best models for both data types.
arXiv Detail & Related papers (2022-06-13T10:20:28Z)
High Fidelity Visualization of What Your Self-Supervised Representation Knows About [22.982471878833362]
In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models. We demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning.
arXiv Detail & Related papers (2021-12-16T19:23:33Z)
High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.