Related papers: OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization

OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization

URL: http://arxiv.org/abs/2403.11053v1
Date: Sun, 17 Mar 2024 01:42:48 GMT
Title: OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization
Authors: Ye Wang, Zili Yi, Rui Ma,
Abstract summary: We introduce a novel parameter-efficient one-shot fine-tuning method for personalized text-to-image (T2I) personalization. A novel hypernetwork-powered attribute-focused fine-tuning mechanism is employed to achieve the precise learning of various attribute features. Our method shows significant superiority in attribute identification and application, as well as achieves a good balance between efficiency and output quality.
Score: 9.552325786494334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized text-to-image (T2I) models not only produce lifelike and varied visuals but also allow users to tailor the images to fit their personal taste. These personalization techniques can grasp the essence of a concept through a collection of images, or adjust a pre-trained text-to-image model with a specific image input for subject-driven or attribute-aware guidance. Yet, accurately capturing the distinct visual attributes of an individual image poses a challenge for these methods. To address this issue, we introduce OSTAF, a novel parameter-efficient one-shot fine-tuning method which only utilizes one reference image for T2I personalization. A novel hypernetwork-powered attribute-focused fine-tuning mechanism is employed to achieve the precise learning of various attribute features (e.g., appearance, shape or drawing style) from the reference image. Comparing to existing image customization methods, our method shows significant superiority in attribute identification and application, as well as achieves a good balance between efficiency and output quality.

Related papers

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization [82.31106470150844]
We introduce Omni-Attribute, the first open-vocabulary image attribute encoder to learn attribute-specific representations.<n>We use a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement.<n>The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation.
arXiv Detail & Related papers (2025-12-11T18:59:56Z)
Per-Query Visual Concept Learning [32.045160884721646]
We show that many existing methods can be substantially augmented by adding a personalization step.<n>Specifically, we leverage PDM features - previously designed to capture identity - and show how they can be used to improve semantic similarity.
arXiv Detail & Related papers (2025-08-12T16:07:27Z)
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models [112.94440113631897]
Current methods attempt to distill identity and style from source images. "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. We formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images.
arXiv Detail & Related papers (2024-12-10T17:02:58Z)
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation [22.599542105037443]
DisEnvisioner is a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information. Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization. Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality.
arXiv Detail & Related papers (2024-10-02T22:29:14Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset. We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model. Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z)
Customizing Text-to-Image Models with a Single Image Pair [47.49970731632113]
Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process.
arXiv Detail & Related papers (2024-05-02T17:59:52Z)
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation [18.841473623776153]
State-of-the-art personalization models tend to overfit the whole subject and cannot disentangle visual characteristics in pixel space. A novel decoupled self-augmentation strategy is proposed to generate target-related and non-target samples to learn user-specified visual attributes. Experiments on various kinds of visual attributes with SOTA personalization methods show the ability of the proposed method to mimic target visual appearance in novel contexts.
arXiv Detail & Related papers (2024-03-29T15:20:34Z)
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions [20.351245266660378]
Recent advances in text-to-image (T2I) diffusion models have significantly improved the quality of generated images. Providing efficient control over individual subjects, particularly the attributes characterizing them, remains a key challenge. No current approach offers both simultaneously, resulting in a gap when trying to achieve precise continuous and subject-specific attribute modulation.
arXiv Detail & Related papers (2024-03-25T18:00:42Z)
Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization. Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions. Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z)
Direct Consistency Optimization for Compositional Text-to-Image Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency. We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z)
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z)
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models [14.657472801570284]
PIA excels in aligning with condition images, achieving motion controllability by text, and the compatibility with various personalized T2I models without specific tuning. A key component of PIA is the introduction of the condition module, which utilizes the condition frame and inter-frame affinity as input to transfer appearance information.
arXiv Detail & Related papers (2023-12-21T15:51:12Z)
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN) It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner. Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.