Related papers: Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

URL: http://arxiv.org/abs/2410.04064v1
Date: Sat, 5 Oct 2024 07:25:56 GMT
Title: Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback
Authors: Fatemeh Pesaran Zadeh, Juyeon Kim, Jin-Hwa Kim, Gunhee Kim,
Abstract summary: We propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library. We introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback.
Score: 37.275533538711436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated strong capabilities across various language tasks, notably through instruction-tuning methods. However, LLMs face challenges in visualizing complex, real-world data through charts and plots. Firstly, existing datasets rarely cover a full range of chart types, such as 3D, volumetric, and gridded charts. Secondly, supervised fine-tuning methods do not fully leverage the intricate relationships within rich datasets, including text, code, and figures. To address these challenges, we propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library, with 11.1K tuples of descriptions, code, data tables, and plots. Moreover, we introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback. Our experiments show that this approach significantly enhances the model performance, enabling smaller models to outperform larger open-source models and be comparable to state-of-the-art proprietary models in data visualization tasks. We make the code and dataset available at https://github.com/fatemehpesaran310/Text2Chart31.

Related papers

In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding [113.17601814293722]
We introduce ChartScope, an LVLM optimized for in-depth chart comprehension across diverse chart types.<n>We propose an efficient data generation pipeline that synthesizes paired data for a wide range of chart types.<n>We also establish ChartDQA, a new benchmark for evaluating not only question-answering at different levels but also underlying data understanding.
arXiv Detail & Related papers (2025-07-18T18:15:09Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding. Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation [90.82566869965011]
textbfChartCoder is the first dedicated chart-to-code MLLM. We introduce textbfChart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation. Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks.
arXiv Detail & Related papers (2025-01-11T17:52:22Z)
AskChart: Universal Chart Understanding through Textual Enhancement [20.075911012193494]
State-of-the-art approaches primarily focus on visual cues from chart images, failing to explicitly incorporate rich textual information embedded within the charts. We introduce AskChart, a universal model that explicitly integrates both textual and visual cues from charts using a Mixture of Experts (MoE) architecture.
arXiv Detail & Related papers (2024-12-26T09:59:43Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts. We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z)
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z)
ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z)
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning. UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language. We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
arXiv Detail & Related papers (2023-05-24T06:11:17Z)
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z)
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization [9.647079534077472]
We present Chart-to-text, a large-scale benchmark with two datasets and a total of 44,096 charts. We explain the dataset construction process and analyze the datasets.
arXiv Detail & Related papers (2022-03-12T17:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.