ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules
- URL: http://arxiv.org/abs/2304.02173v1
- Date: Wed, 5 Apr 2023 00:25:27 GMT
- Title: ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules
- Authors: Zhi-Qi Cheng, Qi Dai, Siyao Li, Jingdong Sun, Teruko Mitamura,
Alexander G. Hauptmann
- Abstract summary: We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
- Score: 89.75395046894809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Charts are a powerful tool for visually conveying complex data, but their
comprehension poses a challenge due to the diverse chart types and intricate
components. Existing chart comprehension methods suffer from either heuristic
rules or an over-reliance on OCR systems, resulting in suboptimal performance.
To address these issues, we present ChartReader, a unified framework that
seamlessly integrates chart derendering and comprehension tasks. Our approach
includes a transformer-based chart component detection module and an extended
pre-trained vision-language model for chart-to-X tasks. By learning the rules
of charts automatically from annotated datasets, our approach eliminates the
need for manual rule-making, reducing effort and enhancing accuracy.~We also
introduce a data variable replacement technique and extend the input and
position embeddings of the pre-trained model for cross-task training. We
evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks,
demonstrating its superiority over existing methods. Our proposed framework can
significantly reduce the manual effort involved in chart analysis, providing a
step towards a universal chart understanding model. Moreover, our approach
offers opportunities for plug-and-play integration with mainstream LLMs such as
T5 and TaPas, extending their capability to chart comprehension tasks. The code
is available at https://github.com/zhiqic/ChartReader.
Related papers
- MSG-Chart: Multimodal Scene Graph for ChartQA [11.828192162922436]
Automatic Chart Question Answering (ChartQA) is challenging due to the complex distribution of chart elements with patterns of the underlying data not explicitly displayed in charts.
We design a joint multimodal scene graph for charts to explicitly represent the relationships between chart elements and their patterns.
Our proposed multimodal scene graph includes a visual graph and a textual graph to jointly capture the structural and semantical knowledge from the chart.
arXiv Detail & Related papers (2024-08-09T04:11:23Z) - Advancing Chart Question Answering with Robust Chart Component Recognition [18.207819321127182]
We introduce a unified framework that enhances chart component recognition by accurately identifying and classifying components such as bars, lines, pies, titles, legends, and axes.
We also propose a novel Question-guided Deformable Co-Attention mechanism, which fuses chart features encoded by Chartformer with the given question, leveraging the question's guidance to ground the correct answer.
arXiv Detail & Related papers (2024-07-19T20:55:06Z) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters.
TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z) - ChartAssisstant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning.
It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text.
Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z) - ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4.
Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - Enhanced Chart Understanding in Vision and Language Task via Cross-modal
Pre-training on Plot Table Pairs [71.55796212450055]
We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs.
Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP)
arXiv Detail & Related papers (2023-05-29T22:29:03Z) - UniChart: A Universal Vision-language Pretrained Model for Chart
Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning.
UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language.
We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
arXiv Detail & Related papers (2023-05-24T06:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.