ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM
- URL: http://arxiv.org/abs/2410.14331v2
- Date: Thu, 07 Nov 2024 14:48:18 GMT
- Title: ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM
- Authors: Songheng Zhang, Lei Wang, Toby Jia-Jun Li, Qiaomu Shen, Yixin Cao, Yong Wang,
- Abstract summary: Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism.
To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers.
We propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts.
- Score: 16.87320295911898
- License:
- Abstract: Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism. However, it is difficult for readers to quickly interpret such data-involved texts and gain deep insights. To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers, which is essentially a challenging task. The challenges originate from text ambiguities, intrinsic sparsity and uncertainty of data in text documents, and subjective sentiment differences. Specifically, we propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts. It consists of two major modules: tabular data inference and expressive chart generation. The tabular data inference module employs systematic prompt engineering to guide the LLM (e.g., GPT-4) to infer table data, where data ranges, uncertainties, missing data values and corresponding subjective sentiments are explicitly considered. The expressive chart generation module augments standard charts with intuitive visual encodings and concise texts to accurately convey the underlying data and insights. We extensively evaluate the effectiveness of ChartifyText on real-world data-involved text documents through case studies, in-depth interviews with three visualization experts, and a carefully-designed user study with 15 participants. The results demonstrate the usefulness and effectiveness of ChartifyText in helping readers efficiently and effectively make sense of data-involved texts.
Related papers
- On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Hierarchical Knowledge Distillation on Text Graph for Data-limited
Attribute Inference [5.618638372635474]
We develop a text-graph-based few-shot learning model for attribute inferences on social media text data.
Our model first constructs and refines a text graph using manifold learning and message passing.
To further use cross-domain texts and unlabeled texts to improve few-shot performance, a hierarchical knowledge distillation is devised over text graph.
arXiv Detail & Related papers (2024-01-10T05:50:34Z) - A Benchmark for Text Expansion: Datasets, Metrics, and Baselines [87.47745669317894]
This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifier into proper locations of the plain text.
We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references.
On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines.
arXiv Detail & Related papers (2023-09-17T07:54:38Z) - Using Large Language Models for Zero-Shot Natural Language Generation
from Knowledge Graphs [4.56877715768796]
We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge.
We also show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.
arXiv Detail & Related papers (2023-07-14T12:45:03Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization
of Long and Short Summaries [0.26097841018267615]
Automatic chart to text summarization is an effective tool for the visually impaired people.
In this paper, we propose ChartSumm: a large-scale benchmark dataset consisting of a total of 84,363 charts.
arXiv Detail & Related papers (2023-04-26T15:25:24Z) - Chart-to-Text: A Large-Scale Benchmark for Chart Summarization [9.647079534077472]
We present Chart-to-text, a large-scale benchmark with two datasets and a total of 44,096 charts.
We explain the dataset construction process and analyze the datasets.
arXiv Detail & Related papers (2022-03-12T17:01:38Z) - EventNarrative: A large-scale Event-centric Dataset for Knowledge
Graph-to-Text Generation [8.216976747904726]
EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, 6 times larger than the current largest parallel dataset.
Our aim is two-fold: help break new ground in event-centric research where data is lacking, and to give researchers a well-defined, large-scale dataset.
arXiv Detail & Related papers (2021-10-30T15:39:20Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.