Multi Positive Contrastive Learning with Pose-Consistent Generated Images
- URL: http://arxiv.org/abs/2404.03256v1
- Date: Thu, 4 Apr 2024 07:26:26 GMT
- Title: Multi Positive Contrastive Learning with Pose-Consistent Generated Images
- Authors: Sho Inayoshi, Aji Resindra Widya, Satoshi Ozaki, Junji Otsuka, Takeshi Ohashi,
- Abstract summary: We propose the generation of visually distinct images with identical human poses.
We then propose a novel multi-positive contrastive learning, which optimally utilize the previously generated images.
Despite using only less than 1% amount of data compared to current state-of-the-art method, GenPoCCL captures structural features of the human body more effectively.
- Score: 0.873811641236639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model pre-training has become essential in various recognition tasks. Meanwhile, with the remarkable advancements in image generation models, pre-training methods utilizing generated images have also emerged given their ability to produce unlimited training data. However, while existing methods utilizing generated images excel in classification, they fall short in more practical tasks, such as human pose estimation. In this paper, we have experimentally demonstrated it and propose the generation of visually distinct images with identical human poses. We then propose a novel multi-positive contrastive learning, which optimally utilize the previously generated images to learn structural features of the human body. We term the entire learning pipeline as GenPoCCL. Despite using only less than 1% amount of data compared to current state-of-the-art method, GenPoCCL captures structural features of the human body more effectively, surpassing existing methods in a variety of human-centric perception tasks.
Related papers
- MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts [61.274246025372044]
We study human-centric text-to-image generation in context of faces and hands.
We propose a method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts.
This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale.
arXiv Detail & Related papers (2024-10-30T17:59:57Z) - Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval [3.2495565849970016]
textttCHAIR enables humans to correct intermediate concepts, which helps textitimprove embeddings generated.
We show that our method performs better than similar models on image retrieval metrics without any external intervention.
arXiv Detail & Related papers (2024-07-12T00:59:32Z) - Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback [5.9726297901501475]
We introduce a novel approach tailored specifically for human image generation utilizing Direct Preference Optimization (DPO)
Specifically, we introduce an efficient method for constructing a specialized DPO dataset for training human image generation models without the need for costly human feedback.
Our method demonstrates its versatility and effectiveness in generating human images, including personalized text-to-image generation.
arXiv Detail & Related papers (2024-05-30T16:18:05Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training.
This paper approaches the generated image detection problem from a new perspective: Start from real images.
By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z) - StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity
3D Avatar Generation [103.88928334431786]
We present a novel method for generating high-quality, stylized 3D avatars.
We use pre-trained image-text diffusion models for data generation and a Generative Adversarial Network (GAN)-based 3D generation network for training.
Our approach demonstrates superior performance over current state-of-the-art methods in terms of visual quality and diversity of the produced avatars.
arXiv Detail & Related papers (2023-05-30T13:09:21Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Continual Learning with Bayesian Model based on a Fixed Pre-trained
Feature Extractor [55.9023096444383]
Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes.
Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning.
arXiv Detail & Related papers (2022-04-28T08:41:51Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z) - MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image
Generation [13.06676286691587]
Pose-guided person image generation usually involves using paired source-target images to supervise the training.
We propose a novel multi-level statistics transfer model, which disentangles and transfers multi-level appearance features from person images.
Our approach allows for flexible manipulation of person appearance and pose properties to perform pose transfer and clothes style transfer tasks.
arXiv Detail & Related papers (2020-11-18T04:38:48Z) - Improved Techniques for Training Single-Image GANs [44.251222212306764]
generative models can be learned from a single image, as opposed to from a large dataset.
We propose some best practices to train a model capable of generating realistic images from only a single sample.
Our model is up to six times faster to train, has fewer parameters, and can better capture the global structure of images.
arXiv Detail & Related papers (2020-03-25T17:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.