Related papers: Few-shot multi-token DreamBooth with LoRa for style-consistent character generation

Few-shot multi-token DreamBooth with LoRa for style-consistent character generation

URL: http://arxiv.org/abs/2510.09475v1
Date: Fri, 10 Oct 2025 15:28:55 GMT
Title: Few-shot multi-token DreamBooth with LoRa for style-consistent character generation
Authors: Ruben Pascual, Mikel Sesma-Sara, Aranzazu Jurio, Daniel Paternain, Mikel Galar,
Abstract summary: The audiovisual industry is undergoing a profound transformation as it is integrating AI developments to inspire new forms of art.<n>This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters.<n>We propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning.
Score: 3.4405653742416145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The audiovisual industry is undergoing a profound transformation as it is integrating AI developments not only to automate routine tasks but also to inspire new forms of art. This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters, thus broadening creative possibilities in animation, gaming, and related domains. Our solution builds upon DreamBooth, a well-established fine-tuning technique for text-to-image diffusion models, and adapts it to tackle two core challenges: capturing intricate character details beyond textual prompts and the few-shot nature of the training data. To achieve this, we propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning. By removing the class-specific regularization set and introducing random tokens and embeddings during generation, our approach allows for unlimited character creation while preserving the learned style. We evaluate our method on five small specialized datasets, comparing it to relevant baselines using both quantitative metrics and a human evaluation study. Our results demonstrate that our approach produces high-quality, diverse characters while preserving the distinctive aesthetic features of the reference characters, with human evaluation further reinforcing its effectiveness and highlighting the potential of our method.

Related papers

Nested Attention: Semantic-aware Attention Values for Concept Personalization [78.90196530697897]
We introduce Nested Attention, a novel mechanism that injects a rich and expressive image representation into the model's existing cross-attention layers.<n>Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image.
arXiv Detail & Related papers (2025-01-02T18:52:11Z)
ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models [3.7599363231894185]
We introduce a novel framework designed to produce consistent character representations from a single text prompt. Our framework outperforms existing methods in generating characters with consistent visual identities.
arXiv Detail & Related papers (2024-06-04T23:39:08Z)
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt. Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z)
The Creative Frontier of Generative AI: Managing the Novelty-Usefulness Tradeoff [0.4873362301533825]
We explore the optimal balance between novelty and usefulness in generative Artificial Intelligence (AI) systems. Overemphasizing either aspect can lead to limitations such as hallucinations and memorization.
arXiv Detail & Related papers (2023-06-06T11:44:57Z)
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation. We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z)
Multi-Modal Experience Inspired AI Creation [33.34566822058209]
We study how to generate texts based on sequential multi-modal information. We firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. We then propose a curriculum negative sampling strategy tailored for the sequential inputs.
arXiv Detail & Related papers (2022-09-02T11:50:41Z)
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources. Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision. We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)
Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction. Our approach enables us to massively scale up the number of character types we can effectively model. We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z)
Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild [103.51604161298512]
We propose an adversarial learning framework for the generation and recognition of multiple characters in an image. Our framework can be integrated into recent recognition methods to achieve new state-of-the-art recognition accuracy.
arXiv Detail & Related papers (2020-01-13T12:41:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.