UniChart: A Universal Vision-language Pretrained Model for Chart
Comprehension and Reasoning
- URL: http://arxiv.org/abs/2305.14761v3
- Date: Tue, 10 Oct 2023 23:39:25 GMT
- Title: UniChart: A Universal Vision-language Pretrained Model for Chart
Comprehension and Reasoning
- Authors: Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, Shafiq Joty
- Abstract summary: We present UniChart, a pretrained model for chart comprehension and reasoning.
UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language.
We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
- Score: 29.947053208614246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Charts are very popular for analyzing data, visualizing key insights and
answering complex reasoning questions about data. To facilitate chart-based
data analysis using natural language, several downstream tasks have been
introduced recently such as chart question answering and chart summarization.
However, most of the methods that solve these tasks use pretraining on language
or vision-language tasks that do not attempt to explicitly model the structure
of the charts (e.g., how data is visually encoded and how chart elements are
related to each other). To address this, we first build a large corpus of
charts covering a wide variety of topics and visual styles. We then present
UniChart, a pretrained model for chart comprehension and reasoning. UniChart
encodes the relevant text, data, and visual elements of charts and then uses a
chart-grounded text decoder to generate the expected output in natural
language. We propose several chart-specific pretraining tasks that include: (i)
low-level tasks to extract the visual elements (e.g., bars, lines) and data
from charts, and (ii) high-level tasks to acquire chart understanding and
reasoning skills. We find that pretraining the model on a large corpus with
chart-specific low- and high-level tasks followed by finetuning on three
down-streaming tasks results in state-of-the-art performance on three
downstream tasks.
Related papers
- Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback [37.275533538711436]
We propose a hierarchical pipeline and a new dataset for chart generation.
Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library.
We introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback.
arXiv Detail & Related papers (2024-10-05T07:25:56Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks [31.414783623207477]
We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary.
We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations.
We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are.
arXiv Detail & Related papers (2024-05-22T12:18:52Z) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters.
TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z) - ChartAssisstant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning.
It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text.
Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z) - ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4.
Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z) - Chart-to-Text: A Large-Scale Benchmark for Chart Summarization [9.647079534077472]
We present Chart-to-text, a large-scale benchmark with two datasets and a total of 44,096 charts.
We explain the dataset construction process and analyze the datasets.
arXiv Detail & Related papers (2022-03-12T17:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.