PartCraft: Crafting Creative Objects by Parts
- URL: http://arxiv.org/abs/2407.04604v2
- Date: Mon, 8 Jul 2024 13:38:49 GMT
- Title: PartCraft: Crafting Creative Objects by Parts
- Authors: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang,
- Abstract summary: This paper propels creative control in generative visual AI by allowing users to "select"
We for the first time allow users to choose visual concepts by parts for their creative endeavors.
Fine-grained generation that precisely captures selected visual concepts.
- Score: 128.30514851911218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achieve this, we first parse objects into parts through unsupervised feature clustering. Then, we encode parts into text tokens and introduce an entropy-based normalized attention loss that operates on them. This loss design enables our model to learn generic prior topology knowledge about object's part composition, and further generalize to novel part compositions to ensure the generation looks holistically faithful. Lastly, we employ a bottleneck encoder to project the part tokens. This not only enhances fidelity but also accelerates learning, by leveraging shared knowledge and facilitating information exchange among instances. Visual results in the paper and supplementary material showcase the compelling power of PartCraft in crafting highly customized, innovative creations, exemplified by the "charming" and creative birds. Code is released at https://github.com/kamwoh/partcraft.
Related papers
- Crafting Parts for Expressive Object Composition [37.791770942390485]
PartCraft enables image generation based on fine-grained part-level details specified for objects in the base text prompt.
PartCraft first localizes object parts by denoising the object region from a specific diffusion process.
After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions.
arXiv Detail & Related papers (2024-06-14T17:31:29Z) - Generated Contents Enrichment [11.196681396888536]
We propose a novel artificial intelligence task termed Generated Contents Enrichment (GCE)
Our proposed GCE strives to perform content enrichment explicitly in both the visual and textual domains.
To tackle GCE, we propose a deep end-to-end adversarial method that explicitly explores semantics and inter-semantic relationships.
arXiv Detail & Related papers (2024-05-06T17:14:09Z) - GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent
Space Reconstruction [76.35904458027694]
Masked autoencoder models lack good generalization ability on graph data.
We propose a novel graph masked autoencoder framework called GiGaMAE.
Our results will shed light on the design of foundation models on graph-structured data.
arXiv Detail & Related papers (2023-08-18T16:30:51Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Object Discovery from Motion-Guided Tokens [50.988525184497334]
We augment the auto-encoder representation learning framework with motion-guidance and mid-level feature tokenization.
Our approach enables the emergence of interpretable object-specific mid-level features.
arXiv Detail & Related papers (2023-03-27T19:14:00Z) - Distilled Reverse Attention Network for Open-world Compositional
Zero-Shot Learning [42.138756191997295]
Open-World Compositional Zero-Shot Learning (OW-CZSL) aims to recognize new compositions of seen attributes and objects.
OW-CZSL methods built on the conventional closed-world setting degrade severely due to the unconstrained OW test space.
We propose a novel Distilled Reverse Attention Network to address the challenges.
arXiv Detail & Related papers (2023-03-01T10:52:20Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - One-shot Scene Graph Generation [130.57405850346836]
We propose Multiple Structured Knowledge (Relational Knowledgesense Knowledge) for the one-shot scene graph generation task.
Our method significantly outperforms existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-02-22T11:32:59Z) - Object-Centric Learning with Slot Attention [43.684193749891506]
We present the Slot Attention module, an architectural component that interfaces with perceptual representations.
Slot Attention produces task-dependent abstract representations which we call slots.
We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions.
arXiv Detail & Related papers (2020-06-26T15:31:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.