Attribute-Centric Compositional Text-to-Image Generation
- URL: http://arxiv.org/abs/2301.01413v1
- Date: Wed, 4 Jan 2023 03:03:08 GMT
- Title: Attribute-Centric Compositional Text-to-Image Generation
- Authors: Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael
Ying Yang
- Abstract summary: ACTIG is an attribute-centric compositional text-to-image generation framework.
We present an attribute-centric feature augmentation and a novel image-free training scheme.
We validate our framework on the CelebA-HQ and CUB datasets.
- Score: 45.12516226662346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent impressive breakthroughs in text-to-image generation,
generative models have difficulty in capturing the data distribution of
underrepresented attribute compositions while over-memorizing overrepresented
attribute compositions, which raises public concerns about their robustness and
fairness. To tackle this challenge, we propose ACTIG, an attribute-centric
compositional text-to-image generation framework. We present an
attribute-centric feature augmentation and a novel image-free training scheme,
which greatly improves model's ability to generate images with underrepresented
attributes. We further propose an attribute-centric contrastive loss to avoid
overfitting to overrepresented attribute compositions. We validate our
framework on the CelebA-HQ and CUB datasets. Extensive experiments show that
the compositional generalization of ACTIG is outstanding, and our framework
outperforms previous works in terms of image quality and text-image
consistency.
Related papers
- Z-Magic: Zero-shot Multiple Attributes Guided Image Creator [24.88532732093652]
We reformulate multi-attribute creation from a conditional probability theory perspective and tackle the challenging zero-shot setting.
By explicitly modeling the dependencies between attributes, we further enhance the coherence of generated images.
We identify connections between multi-attribute customization and multi-task learning, effectively addressing the high computing cost encountered in multi-attribute synthesis.
arXiv Detail & Related papers (2025-03-15T13:07:58Z) - TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation [10.569380190029317]
TAGE is an innovative image generation network comprising three integral modules.
The CPM module delves into the semantic dimensions of category-agnostic attributes, encapsulating them within a discrete codebook.
The PSM module generates semantic cues that are seamlessly integrated into the Transformer architecture of the CPM.
arXiv Detail & Related papers (2024-10-23T13:26:19Z) - ARMADA: Attribute-Based Multimodal Data Augmentation [93.05614922383822]
Attribute-based Multimodal Data Augmentation (ARMADA) is a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes.
ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation.
This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.
arXiv Detail & Related papers (2024-08-19T15:27:25Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models [46.723653095494896]
We show that imperfect text conditioning with CLIP text-encoder is one of the primary reasons behind the inability of text-to-image models to generate high-fidelity compositional scenes.
Our main finding shows that the best compositional improvements can be achieved without harming the model's FID scores.
arXiv Detail & Related papers (2024-06-12T03:21:34Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image
Generation [18.36261166580862]
Text-to-image generation aims to generate photo-realistic and semantically consistent images according to the given text descriptions.
Existing methods mainly extract the text information from only one sentence to represent an image.
We propose an effective text representation method with the complements of attribute information.
arXiv Detail & Related papers (2022-09-28T12:28:54Z) - StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis [52.341186561026724]
Lacking compositionality could have severe implications for robustness and fairness.
We introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis.
Results show that StyleT2I outperforms previous approaches in terms of consistency between the input text and synthesized images.
arXiv Detail & Related papers (2022-03-29T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.