Let the Chart Spark: Embedding Semantic Context into Chart with
Text-to-Image Generative Model
- URL: http://arxiv.org/abs/2304.14630v2
- Date: Sun, 2 Jul 2023 11:12:56 GMT
- Title: Let the Chart Spark: Embedding Semantic Context into Chart with
Text-to-Image Generative Model
- Authors: Shishi Xiao, Suizi Huang, Yue Lin, Yilin Ye, Wei Zeng
- Abstract summary: Pictorial visualization seamlessly integrates data and semantic context into visual representation.
We propose ChartSpark, a novel system that embeds semantic context into chart based on text-to-image generative model.
We develop an interactive visual interface that integrates a text analyzer, editing module, and evaluation module to enable users to generate, modify, and assess pictorial visualizations.
- Score: 7.587729429265939
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pictorial visualization seamlessly integrates data and semantic context into
visual representation, conveying complex information in a manner that is both
engaging and informative. Extensive studies have been devoted to developing
authoring tools to simplify the creation of pictorial visualizations. However,
mainstream works mostly follow a retrieving-and-editing pipeline that heavily
relies on retrieved visual elements from a dedicated corpus, which often
compromise the data integrity. Text-guided generation methods are emerging, but
may have limited applicability due to its predefined recognized entities. In
this work, we propose ChartSpark, a novel system that embeds semantic context
into chart based on text-to-image generative model. ChartSpark generates
pictorial visualizations conditioned on both semantic context conveyed in
textual inputs and data information embedded in plain charts. The method is
generic for both foreground and background pictorial generation, satisfying the
design practices identified from an empirical research into existing pictorial
visualizations. We further develop an interactive visual interface that
integrates a text analyzer, editing module, and evaluation module to enable
users to generate, modify, and assess pictorial visualizations. We
experimentally demonstrate the usability of our tool, and conclude with a
discussion of the potential of using text-to-image generative model combined
with interactive interface for visualization design.
Related papers
- Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions.
Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image.
We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z) - Generated Contents Enrichment [11.196681396888536]
We propose a novel artificial intelligence task termed Generated Contents Enrichment (GCE)
Our proposed GCE strives to perform content enrichment explicitly in both the visual and textual domains.
To tackle GCE, we propose a deep end-to-end adversarial method that explicitly explores semantics and inter-semantic relationships.
arXiv Detail & Related papers (2024-05-06T17:14:09Z) - Visual Analytics for Efficient Image Exploration and User-Guided Image
Captioning [35.47078178526536]
Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension.
This paper tackles two well-known issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of potential data biases within them; (2) the evaluation of image captions and steering of their generation process.
arXiv Detail & Related papers (2023-11-02T06:21:35Z) - Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - Focus! Relevant and Sufficient Context Selection for News Image
Captioning [69.36678144800936]
News Image Captioning requires describing an image by leveraging additional context from a news article.
We propose to use the pre-trained vision and language retrieval model CLIP to localize the visually grounded entities in the news article.
Our experiments demonstrate that by simply selecting a better context from the article, we can significantly improve the performance of existing models.
arXiv Detail & Related papers (2022-12-01T20:00:27Z) - Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities.
We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - Learning to Represent Image and Text with Denotation Graph [32.417311523031195]
We propose learning representations from a set of implied, visually grounded expressions between image and text.
We show that state-of-the-art multimodal learning models can be further improved by leveraging automatically harvested structural relations.
arXiv Detail & Related papers (2020-10-06T18:00:58Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.