Related papers: TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models

TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models

URL: http://arxiv.org/abs/2508.08812v1
Date: Tue, 12 Aug 2025 10:14:15 GMT
Title: TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models
Authors: Yuqi Peng, Lingtao Zheng, Yufeng Yang, Yi Huang, Mingfu Yan, Jianzhuang Liu, Shifeng Chen,
Abstract summary: We propose Token-Aware LoRA (TARA) for personalized text-to-image generation.<n>TARA constrains each module to focus on its associated rare token to avoid interference, and a training objective encourages the spatial attention of a rare token to align with its concept region.<n>Our method enables training-free multi-concept composition by directly injecting multiple independently trained TARA modules at inference time.
Score: 34.116172209476254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized text-to-image generation aims to synthesize novel images of a specific subject or style using only a few reference images. Recent methods based on Low-Rank Adaptation (LoRA) enable efficient single-concept customization by injecting lightweight, concept-specific adapters into pre-trained diffusion models. However, combining multiple LoRA modules for multi-concept generation often leads to identity missing and visual feature leakage. In this work, we identify two key issues behind these failures: (1) token-wise interference among different LoRA modules, and (2) spatial misalignment between the attention map of a rare token and its corresponding concept-specific region. To address these issues, we propose Token-Aware LoRA (TARA), which introduces a token mask to explicitly constrain each module to focus on its associated rare token to avoid interference, and a training objective that encourages the spatial attention of a rare token to align with its concept region. Our method enables training-free multi-concept composition by directly injecting multiple independently trained TARA modules at inference time. Experimental results demonstrate that TARA enables efficient multi-concept inference and effectively preserving the visual identity of each concept by avoiding mutual interference between LoRA modules. The code and models are available at https://github.com/YuqiPeng77/TARA.

Related papers

ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement [15.939409734710198]
We present ConceptSplit, a novel framework to split individual concepts through training and inference.<n>Our framework comprises two key components. First, we introduce Token-wise Value Adaptation (ToVA), a merging-free training method.<n>Second, we propose Latent Optimization for Disentangled Attention (LODA), which alleviates attention entanglement during inference.
arXiv Detail & Related papers (2025-10-06T10:22:46Z)
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation [73.16975077770765]
Modular customization is essential for applications like concept stylization and multi-concept customization.<n>Instant merging methods often cause identity loss and interference of individual merged concepts.<n>We propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity.
arXiv Detail & Related papers (2025-03-11T16:10:36Z)
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [3.5066393042242123]
We propose AttenCraft, an attention-based method for multiple-concept disentanglement.<n>We introduce an adaptive algorithm based on attention scores to estimate sampling ratios for different concepts.<n>Our model effectively mitigates two issues, achieving state-of-the-art image fidelity and comparable prompt fidelity to baseline models.
arXiv Detail & Related papers (2024-05-28T08:50:14Z)
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z)
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models [47.63060402915307]
OMG is a framework designed to seamlessly integrate multiple concepts within a single image. OMG exhibits superior performance in multi-concept personalization. LoRA models on civitai.com can be exploited directly.
arXiv Detail & Related papers (2024-03-16T17:30:15Z)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification [64.36210786350568]
We propose a novel learning framework named textbfEDITOR to select diverse tokens from vision Transformers for multi-modal object ReID. Our framework can generate more discriminative features for multi-modal object ReID.
arXiv Detail & Related papers (2024-03-15T12:44:35Z)
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs) The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z)
Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition. We propose augmenting the input image with masks that indicate the presence of target concepts. We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
Attentive WaveBlock: Complementarity-enhanced Mutual Networks for Unsupervised Domain Adaptation in Person Re-identification and Beyond [97.25179345878443]
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB) AWB can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels. Experiments demonstrate that the proposed method achieves state-of-the-art performance with significant improvements on multiple UDA person re-identification tasks.
arXiv Detail & Related papers (2020-06-11T15:40:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.