CAT: Contrastive Adapter Training for Personalized Image Generation
- URL: http://arxiv.org/abs/2404.07554v2
- Date: Wed, 23 Oct 2024 07:16:42 GMT
- Title: CAT: Contrastive Adapter Training for Personalized Image Generation
- Authors: Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song,
- Abstract summary: We present Contrastive Adapter Training (CAT) to enhance adapter training through the application of CAT loss.
Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters.
- Score: 4.093428697109545
- License:
- Abstract: The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.
Related papers
- Auto-selected Knowledge Adapters for Lifelong Person Re-identification [54.42307214981537]
Lifelong Person Re-Identification requires systems to continually learn from non-overlapping datasets across different times and locations.
Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting.
We introduce a novel framework AdalReID, that adopts knowledge adapters and a parameter-free auto-selection mechanism for lifelong learning.
arXiv Detail & Related papers (2024-05-29T11:42:02Z) - Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation.
Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z) - Domain Generalization Using Large Pretrained Models with
Mixture-of-Adapters [35.834509022013435]
Domain generalization (DG) algorithm aims to maintain the performance of a trained model on different distributions.
We propose a mixture-of-expert based adapter fine-tuning method, dubbed as mixture-of-adapters (MoA)
arXiv Detail & Related papers (2023-10-17T07:01:24Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery [0.9349784561232036]
Generalized Continual Category Discovery (GCCD) tackles learning from sequentially arriving, partially labeled datasets.
We introduce a novel technique integrating a learnable projector with feature distillation, thus enhancing model adaptability without sacrificing past knowledge.
We demonstrate that while each component offers modest benefits individually, their combination - dubbed CAMP - significantly improves the balance between learning new information and retaining old.
arXiv Detail & Related papers (2023-08-23T13:02:52Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [58.06983806317233]
Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for learning visual representations using large-scale image-text pairs.
To enhance CLIP's adaption capability, existing methods proposed to fine-tune additional learnable modules.
We propose a training-free adaption method for CLIP to conduct few-shot classification, termed as Tip-Adapter.
arXiv Detail & Related papers (2022-07-19T19:12:11Z) - CATs++: Boosting Cost Aggregation with Convolutions and Transformers [31.22435282922934]
We introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map.
Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer, we propose CATs++.
Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks.
arXiv Detail & Related papers (2022-02-14T15:54:58Z) - Towards Fine-grained Image Classification with Generative Adversarial
Networks and Facial Landmark Detection [0.0]
We use GAN-based data augmentation to generate extra dataset instances.
We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model.
arXiv Detail & Related papers (2021-08-28T06:32:42Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.