Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
- URL: http://arxiv.org/abs/2405.12538v1
- Date: Tue, 21 May 2024 07:07:44 GMT
- Title: Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
- Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli,
- Abstract summary: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem.
We propose a knowledge-enhanced iterative refinement framework for visual content generation.
We demonstrate the efficacy of the proposed framework through preliminary results.
- Score: 27.568260631117365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation.
Related papers
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking [57.06347681695629]
We propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection.
Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth.
Human evaluations and expert feedback highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles.
arXiv Detail & Related papers (2025-01-16T18:58:06Z) - Foundations of GenIR [14.45971746205563]
The chapter discusses the foundational impact of modern generative AI models on information access systems.
In contrast to traditional AI, the large-scale training and superior data modeling of generative AI models enable them to produce high-quality, human-like responses.
arXiv Detail & Related papers (2025-01-06T08:38:29Z) - Personalized Representation from Personalized Generation [36.848215621708235]
We formalize the challenge of using personalized synthetic data to learn personalized representations.
We show that our method improves personalized representation learning for diverse downstream tasks.
arXiv Detail & Related papers (2024-12-20T18:59:03Z) - Interactive Visual Assessment for Text-to-Image Generation Models [28.526897072724662]
We propose DyEval, a dynamic interactive visual assessment framework for generative models.
DyEval features an intuitive visual interface that enables users to interactively explore and analyze model behaviors.
Our framework provides valuable insights for improving generative models and has broad implications for advancing the reliability and capabilities of visual generation systems.
arXiv Detail & Related papers (2024-11-23T10:06:18Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse.
We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space.
Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - A Comprehensive Survey of AI-Generated Content (AIGC): A History of
Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC)
The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z) - Knowledge-enriched Attention Network with Group-wise Semantic for Visual
Storytelling [39.59158974352266]
Visual storytelling aims at generating an imaginary and coherent story with narrative multi-sentences from a group of relevant images.
Existing methods often generate direct and rigid descriptions of apparent image-based contents, because they are not capable of exploring implicit information beyond images.
To address these problems, a novel knowledge-enriched attention network with group-wise semantic model is proposed.
arXiv Detail & Related papers (2022-03-10T12:55:47Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.