Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
- URL: http://arxiv.org/abs/2405.12538v1
- Date: Tue, 21 May 2024 07:07:44 GMT
- Title: Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
- Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli,
- Abstract summary: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem.
We propose a knowledge-enhanced iterative refinement framework for visual content generation.
We demonstrate the efficacy of the proposed framework through preliminary results.
- Score: 27.568260631117365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation.
Related papers
- VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning [68.98988753763666]
We propose VisualCloze, a universal image generation framework.
VisualCloze supports a wide range of in-domain tasks, generalization to unseen ones, unseen unification of multiple tasks, and reverse generation.
We introduce Graph200K, a graph-structured dataset that establishes various interrelated tasks, enhancing task density and transferable knowledge.
arXiv Detail & Related papers (2025-04-10T17:59:42Z) - Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning [51.0864247376786]
We introduce a Knowledge Graph Enhanced Generative Multi-modal model (KG-GMM) that builds an evolving knowledge graph throughout the learning process.
During testing, we propose a Knowledge Graph Augmented Inference method that locates specific categories by analyzing relationships within the generated text.
arXiv Detail & Related papers (2025-03-24T07:20:43Z) - WeGen: A Unified Model for Interactive Multimodal Generation as We Chat [51.78489661490396]
We introduce WeGen, a model that unifies multimodal generation and understanding.
It can generate diverse results with high creativity for less detailed instructions.
We show it achieves state-of-the-art performance across various visual generation benchmarks.
arXiv Detail & Related papers (2025-03-03T02:50:07Z) - OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking [57.06347681695629]
We propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection.
Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth.
Human evaluations and expert feedback highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles.
arXiv Detail & Related papers (2025-01-16T18:58:06Z) - Foundations of GenIR [14.45971746205563]
The chapter discusses the foundational impact of modern generative AI models on information access systems.
In contrast to traditional AI, the large-scale training and superior data modeling of generative AI models enable them to produce high-quality, human-like responses.
arXiv Detail & Related papers (2025-01-06T08:38:29Z) - Personalized Representation from Personalized Generation [36.848215621708235]
We formalize the challenge of using personalized synthetic data to learn personalized representations.
We show that our method improves personalized representation learning for diverse downstream tasks.
arXiv Detail & Related papers (2024-12-20T18:59:03Z) - Interactive Visual Assessment for Text-to-Image Generation Models [28.526897072724662]
We propose DyEval, a dynamic interactive visual assessment framework for generative models.
DyEval features an intuitive visual interface that enables users to interactively explore and analyze model behaviors.
Our framework provides valuable insights for improving generative models and has broad implications for advancing the reliability and capabilities of visual generation systems.
arXiv Detail & Related papers (2024-11-23T10:06:18Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse.
We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space.
Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions.
Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image.
We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - Enhancing Retrieval-Augmented Large Language Models with Iterative
Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge.
Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z) - A Comprehensive Survey of AI-Generated Content (AIGC): A History of
Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC)
The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z) - Building Knowledge-Grounded Dialogue Systems with Graph-Based Semantic Modeling [43.0554223015728]
The knowledge-grounded dialogue task aims to generate responses that convey information from given knowledge documents.
We propose a novel graph structure, Grounded Graph, that models the semantic structure of both dialogue and knowledge.
We also propose a Grounded Graph Aware Transformer to enhance knowledge-grounded response generation.
arXiv Detail & Related papers (2022-04-27T03:31:46Z) - Knowledge-enriched Attention Network with Group-wise Semantic for Visual
Storytelling [39.59158974352266]
Visual storytelling aims at generating an imaginary and coherent story with narrative multi-sentences from a group of relevant images.
Existing methods often generate direct and rigid descriptions of apparent image-based contents, because they are not capable of exploring implicit information beyond images.
To address these problems, a novel knowledge-enriched attention network with group-wise semantic model is proposed.
arXiv Detail & Related papers (2022-03-10T12:55:47Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.