Related papers: ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models

ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models

URL: http://arxiv.org/abs/2406.02820v1
Date: Tue, 4 Jun 2024 23:39:08 GMT
Title: ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models
Authors: Kiymet Akdemir, Pinar Yanardag,
Abstract summary: We introduce a novel framework designed to produce consistent character representations from a single text prompt. Our framework outperforms existing methods in generating characters with consistent visual identities.
Score: 3.7599363231894185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image diffusion models have recently taken center stage as pivotal tools in promoting visual creativity across an array of domains such as comic book artistry, children's literature, game development, and web design. These models harness the power of artificial intelligence to convert textual descriptions into vivid images, thereby enabling artists and creators to bring their imaginative concepts to life with unprecedented ease. However, one of the significant hurdles that persist is the challenge of maintaining consistency in character generation across diverse contexts. Variations in textual prompts, even if minor, can yield vastly different visual outputs, posing a considerable problem in projects that require a uniform representation of characters throughout. In this paper, we introduce a novel framework designed to produce consistent character representations from a single text prompt across diverse settings. Through both quantitative and qualitative analyses, we demonstrate that our framework outperforms existing methods in generating characters with consistent visual identities, underscoring its potential to transform creative industries. By addressing the critical challenge of character consistency, we not only enhance the practical utility of these models but also broaden the horizons for artistic and creative expression.

Related papers

VLM-Guided Adaptive Negative Prompting for Creative Generation [21.534474554320823]
Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance.<n>We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation.<n>We show consistent gains in creative novelty with negligible computational overhead.
arXiv Detail & Related papers (2025-10-12T17:34:59Z)
Few-shot multi-token DreamBooth with LoRa for style-consistent character generation [3.4405653742416145]
The audiovisual industry is undergoing a profound transformation as it is integrating AI developments to inspire new forms of art.<n>This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters.<n>We propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning.
arXiv Detail & Related papers (2025-10-10T15:28:55Z)
Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models [17.541927454984318]
Plot'n Polish is a zero-shot framework that enables consistent story generation and provides fine-grained control over story visualizations at various levels of detail.
arXiv Detail & Related papers (2025-09-04T17:59:34Z)
WordCraft: Interactive Artistic Typography with Attention Awareness and Noise Blending [12.655120187133779]
Artistic typography aims to stylize input characters with visual effects that are both creative and legible.<n>Traditional approaches rely heavily on manual design, while recent generative models, particularly diffusion-based methods, have enabled automated character stylization.<n>We introduce WordCraft, an interactive artistic typography system that integrates diffusion models to address these limitations.
arXiv Detail & Related papers (2025-07-13T10:49:09Z)
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark [61.412934963260724]
Existing diffusion-based text-to-image models often struggle to accurately embed text within images. We introduce TextInVision, a large-scale, text and prompt complexity driven benchmark to evaluate the ability of diffusion models to integrate visual text into images.
arXiv Detail & Related papers (2025-03-17T21:36:31Z)
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models [115.62816053600085]
We present DesignDiffusion, a framework for synthesizing design images from textual descriptions.<n>The proposed framework directly synthesizes textual and visual design elements from user prompts.<n>It utilizes a distinctive character embedding derived from the visual text to enhance the input prompt.
arXiv Detail & Related papers (2025-03-03T15:22:57Z)
A Critical Assessment of Modern Generative Models' Ability to Replicate Artistic Styles [0.0]
This paper presents a critical assessment of the style replication capabilities of contemporary generative models. We examine how effectively these models reproduce traditional artistic styles while maintaining structural integrity and compositional balance. The analysis is based on a new large dataset of AI-generated works imitating artistic styles of the past.
arXiv Detail & Related papers (2025-02-21T07:00:06Z)
Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation. T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme. Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z)
ArtCrafter: Text-Image Aligning Style Transfer via Embedding Reframing [25.610375901522886]
ArtCrafter is a novel framework for text-to-image style transfer. We introduce an attention-based style extraction module. We also present a novel text-image aligning augmentation component.
arXiv Detail & Related papers (2025-01-03T19:17:27Z)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models. We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals. Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z)
Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model [0.4374837991804086]
Latent Diffusion Models (LDMs) signifies a paradigm shift in the domain of AI capabilities. This article delves into the feasibility of employing the Stable Diffusion LDM to illustrate literary works.
arXiv Detail & Related papers (2024-08-01T13:28:15Z)
Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects. We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z)
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion [74.44273919041912]
Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images. However, adapting these models for artistic image editing presents two significant challenges. We build the innovative unified framework Creative Synth, which is based on a diffusion model with the ability to coordinate multimodal inputs.
arXiv Detail & Related papers (2024-01-25T10:42:09Z)
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt. Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z)
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z)
Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts. These models have been trained using text data collected from content-based labelling protocols. We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.