DreamCreature: Crafting Photorealistic Virtual Creatures from
Imagination
- URL: http://arxiv.org/abs/2311.15477v1
- Date: Mon, 27 Nov 2023 01:24:31 GMT
- Title: DreamCreature: Crafting Photorealistic Virtual Creatures from
Imagination
- Authors: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang
- Abstract summary: We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts.
We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts.
The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
- Score: 140.1641573781066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent text-to-image (T2I) generative models allow for high-quality synthesis
following either text instructions or visual examples. Despite their
capabilities, these models face limitations in creating new, detailed creatures
within specific categories (e.g., virtual dog or bird species), which are
valuable in digital asset creation and biodiversity analysis. To bridge this
gap, we introduce a novel task, Virtual Creatures Generation: Given a set of
unlabeled images of the target concepts (e.g., 200 bird species), we aim to
train a T2I model capable of creating new, hybrid concepts within diverse
backgrounds and contexts. We propose a new method called DreamCreature, which
identifies and extracts the underlying sub-concepts (e.g., body parts of a
specific species) in an unsupervised manner. The T2I thus adapts to generate
novel concepts (e.g., new bird species) with faithful structures and
photorealistic appearance by seamlessly and flexibly composing learned
sub-concepts. To enhance sub-concept fidelity and disentanglement, we extend
the textual inversion technique by incorporating an additional projector and
tailored attention loss regularization. Extensive experiments on two
fine-grained image benchmarks demonstrate the superiority of DreamCreature over
prior methods in both qualitative and quantitative evaluation. Ultimately, the
learned sub-concepts facilitate diverse creative applications, including
innovative consumer product designs and nuanced property modifications.
Related papers
- MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models.
To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Attention Calibration for Disentangled Text-to-Image Personalization [12.339742346826403]
We propose an attention calibration mechanism to improve the concept-level understanding of the T2I model.
We demonstrate that our method outperforms the current state of the art in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2024-03-27T13:31:39Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior
Constraints [56.824187892204314]
We present the task of creative text-to-image generation, where we seek to generate new members of a broad category.
We show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior.
We incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations.
arXiv Detail & Related papers (2023-08-03T17:04:41Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models.
We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject.
The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.