DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis
- URL: http://arxiv.org/abs/2503.17728v1
- Date: Sat, 22 Mar 2025 10:56:35 GMT
- Title: DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis
- Authors: Yongjin Choi, Chanhun Park, Seung Jun Baek,
- Abstract summary: We propose DynASyn, an effective multi-subject personalization from a single reference image.<n>DynASyn preserves the subject identity in the personalization process by aligning concept-based priors with subject appearances and actions.<n>In addition, we propose concept-based prompt-and-image augmentation for an enhanced trade-off between identity preservation and action diversity.
- Score: 3.6294581578004332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in text-to-image diffusion models spurred research on personalization, i.e., a customized image synthesis, of subjects within reference images. Although existing personalization methods are able to alter the subjects' positions or to personalize multiple subjects simultaneously, they often struggle to modify the behaviors of subjects or their dynamic interactions. The difficulty is attributable to overfitting to reference images, which worsens if only a single reference image is available. We propose DynASyn, an effective multi-subject personalization from a single reference image addressing these challenges. DynASyn preserves the subject identity in the personalization process by aligning concept-based priors with subject appearances and actions. This is achieved by regularizing the attention maps between the subject token and images through concept-based priors. In addition, we propose concept-based prompt-and-image augmentation for an enhanced trade-off between identity preservation and action diversity. We adopt an SDE-based editing guided by augmented prompts to generate diverse appearances and actions while maintaining identity consistency in the augmented images. Experiments show that DynASyn is capable of synthesizing highly realistic images of subjects with novel contexts and dynamic interactions with the surroundings, and outperforms baseline methods in both quantitative and qualitative aspects.
Related papers
- Nested Attention: Semantic-aware Attention Values for Concept Personalization [78.90196530697897]
We introduce Nested Attention, a novel mechanism that injects a rich and expressive image representation into the model's existing cross-attention layers.
Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image.
arXiv Detail & Related papers (2025-01-02T18:52:11Z) - Imagine yourself: Tuning-Free Personalized Image Generation [39.63411174712078]
We introduce Imagine yourself, a state-of-the-art model designed for personalized image generation.
It operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments.
Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment.
arXiv Detail & Related papers (2024-09-20T09:21:49Z) - Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation [29.87407471246318]
This research delves into the complexities of synchronizing facial movements and creating visually appealing, temporally consistent animations.
Our innovative approach embraces the end-to-end diffusion paradigm and introduces a hierarchical audio-driven visual synthesis module.
The proposed hierarchical audio-driven visual synthesis offers adaptive control over expression and pose diversity, enabling more effective personalization tailored to different identities.
arXiv Detail & Related papers (2024-06-13T04:33:20Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - Collaborative Score Distillation for Consistent Visual Synthesis [70.29294250371312]
Collaborative Score Distillation (CSD) is based on the Stein Variational Gradient Descent (SVGD)
We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes.
Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.
arXiv Detail & Related papers (2023-07-04T17:31:50Z) - Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects.
By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models.
We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject.
The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z) - Semantic Photo Manipulation with a Generative Image Prior [86.01714863596347]
GANs are able to synthesize images conditioned on inputs such as user sketch, text, or semantic labels.
It is hard for GANs to precisely reproduce an input image.
In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image.
Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image.
arXiv Detail & Related papers (2020-05-15T18:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.