StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding
- URL: http://arxiv.org/abs/2309.11268v4
- Date: Mon, 19 Feb 2024 03:48:55 GMT
- Title: StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding
- Authors: Renqiu Xia, Bo Zhang, Haoyang Peng, Hancheng Ye, Xiangchao Yan, Peng
Ye, Botian Shi, Yu Qiao, Junchi Yan
- Abstract summary: Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
- Score: 58.38480335579541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Charts are common in literature across different scientific fields, conveying
rich information easily accessible to readers. Current chart-related tasks
focus on either chart perception which refers to extracting information from
the visual charts, or performing reasoning given the extracted data, e.g. in a
tabular form. In this paper, we aim to establish a unified and label-efficient
learning paradigm for joint perception and reasoning tasks, which can be
generally applicable to different downstream tasks, beyond the
question-answering task as specifically studied in peer works. Specifically,
StructChart first reformulates the chart information from the popular tubular
form (specifically linearized CSV) to the proposed Structured Triplet
Representations (STR), which is more friendly for reducing the task gap between
chart perception and reasoning due to the employed structured information
extraction for charts. We then propose a Structuring Chart-oriented
Representation Metric (SCRM) to quantitatively evaluate the performance for the
chart perception task. To enrich the dataset for training, we further explore
the possibility of leveraging the Large Language Model (LLM), enhancing the
chart diversity in terms of both chart visual style and its statistical
information. Extensive experiments are conducted on various chart-related
tasks, demonstrating the effectiveness and promising potential for a unified
chart perception-reasoning paradigm to push the frontier of chart
understanding.
Related papers
- On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization [32.19963543411396]
This study constructs a large-scale dataset of comprehensive chart-caption pairs and fine-tuning instructions on each chart.
We propose an innovative chart summarization method, ChartThinker, which synthesizes deep analysis based on chains of thought.
Built upon the curated datasets, our trained model consistently exhibits superior performance in chart summarization tasks.
arXiv Detail & Related papers (2024-03-17T14:49:09Z) - ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning [28.204261069650897]
We introduce ChartInstruct: a novel chart-specific vision-language Instruction-following dataset comprising 191K instructions generated with 71K charts.
In experiments on four downstream tasks, we first show the effectiveness of our model--achieving a new set of state-of-the-art results.
arXiv Detail & Related papers (2024-03-14T01:40:23Z) - ChartAssisstant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning.
It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text.
Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z) - ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4.
Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z) - Enhanced Chart Understanding in Vision and Language Task via Cross-modal
Pre-training on Plot Table Pairs [71.55796212450055]
We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs.
Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP)
arXiv Detail & Related papers (2023-05-29T22:29:03Z) - UniChart: A Universal Vision-language Pretrained Model for Chart
Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning.
UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language.
We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
arXiv Detail & Related papers (2023-05-24T06:11:17Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.