Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
- URL: http://arxiv.org/abs/2405.01705v1
- Date: Thu, 2 May 2024 20:03:19 GMT
- Title: Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
- Authors: Rafael Elberg, Denis Parra, Mircea Petrache,
- Abstract summary: We propose a new method for image augmentation in long-tailed data based on leveraging the rich latent space of pre-trained Stable Diffusion Models.
We build this space via Iterated Learning of underlying sparsified embeddings, which we apply to task-specific saliency maps via a K-NN approach.
- Score: 0.7578439720012189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image and multimodal machine learning tasks are very challenging to solve in the case of poorly distributed data. In particular, data availability and privacy restrictions exacerbate these hurdles in the medical domain. The state of the art in image generation quality is held by Latent Diffusion models, making them prime candidates for tackling this problem. However, a few key issues still need to be solved, such as the difficulty in generating data from under-represented classes and a slow inference process. To mitigate these issues, we propose a new method for image augmentation in long-tailed data based on leveraging the rich latent space of pre-trained Stable Diffusion Models. We create a modified separable latent space to mix head and tail class examples. We build this space via Iterated Learning of underlying sparsified embeddings, which we apply to task-specific saliency maps via a K-NN approach. Code is available at https://github.com/SugarFreeManatee/Feature-Space-Augmentation-and-Iterated-Learning
Related papers
- Latent-based Diffusion Model for Long-tailed Recognition [10.410057703866899]
Long-tailed imbalance distribution is a common issue in practical computer vision applications.
We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR) as a feature augmentation method to tackle the issue.
The model's accuracy shows an improvement on the CIFAR-LT and ImageNet-LT datasets by using the proposed method.
arXiv Detail & Related papers (2024-04-06T06:15:07Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation [5.049466204159458]
Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data.
In this paper, we propose an SSL approach for segmenting histopathological images via generative diffusion models.
Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task.
arXiv Detail & Related papers (2023-09-04T09:49:24Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z) - Smoothing the Generative Latent Space with Mixup-based Distance Learning [32.838539968751924]
We consider the situation where neither large scale dataset of our interest nor transferable source dataset is available.
We propose latent mixup-based distance regularization on the feature space of both a generator and the counterpart discriminator.
arXiv Detail & Related papers (2021-11-23T06:39:50Z) - Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face
Learning [54.13876727413492]
In many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID.
With the non-uniform increase of samples, such issue is converted to a more general case, a.k.a a long-tail face learning.
Based on the Semi-Siamese Training (SST), we introduce an advanced solution, named Multi-Agent Semi-Siamese Training (MASST)
MASST includes a probe network and multiple gallery agents, the former aims to encode the probe features, and the latter constitutes a stack of
arXiv Detail & Related papers (2021-05-10T04:57:32Z) - ResLT: Residual Learning for Long-tailed Recognition [64.19728932445523]
We propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space.
We design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively.
We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018.
arXiv Detail & Related papers (2021-01-26T08:43:50Z) - IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces [6.574517227976925]
IntroVAC learns interpretable latent subspaces by exploiting information from an additional label.
We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
arXiv Detail & Related papers (2020-08-03T10:21:41Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.