Related papers: Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning

Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning

URL: http://arxiv.org/abs/2504.02906v1
Date: Thu, 03 Apr 2025 07:51:20 GMT
Title: Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning
Authors: Zhihan Zhang, Yixin Cao, Lizi Liao,
Abstract summary: We introduce Chart2Code, a novel iterative dual preference learning framework for chart-to-code generation.<n>We find that Chart2Code consistently improves out-of-distribution chart-to-code generation quality.<n>Our framework paves the way for future advancements in chart comprehension.
Score: 16.22363384653305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chart-to-code generation, the process of converting chart images into executable plotting scripts, provides a lossless representation of chart information, requiring models to accurately capture and summarize all visual and structural elements. However, this remains a significant challenge for multimodal large language models (MLLMs), which are not inherently well-aligned with code generation tasks. To bridge this gap, we introduce Chart2Code, a novel iterative dual preference learning framework designed to enhance MLLMs' chart-to-code generation capabilities through structured code variant generation and fine-grained dual reward signals. We validate Chart2Code across three MLLMs and find that iterative preference learning consistently improves out-of-distribution chart-to-code generation quality. Throughout this process, our dual scoring method, which evaluates both the textual code structure and its visual representation, leads to greater performance improvements, even with a reduced preference dataset size. Further analysis explores the key components of our framework and highlights the interplay between chart-to-code generation and broader chart reasoning, paving the way for future advancements in chart comprehension.

Related papers

Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning. We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations. Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z)
TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization [2.1067477213933503]
TabGLM (Tabular Graph Language Model) is a novel multi-modal architecture designed to model both structural and semantic information from a table.<n>It transforms each row of a table into a fully connected graph and serialized text, which are encoded using a graph neural network (GNN) and a text encoder, respectively.<n> Evaluations across 25 benchmark datasets demonstrate substantial performance gains.
arXiv Detail & Related papers (2025-02-26T05:32:45Z)
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling [100.33658998796064]
We build a vision-language model (VLM) based multi-agent framework for effective automatic chart generation.<n>We propose METAL, a multi-agent framework that decomposes the task of chart generation into the iterative collaboration among specialized agents.
arXiv Detail & Related papers (2025-02-24T21:01:39Z)
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation [90.82566869965011]
textbfChartCoder is the first dedicated chart-to-code MLLM.<n>We introduce textbfChart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation.<n> Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks.
arXiv Detail & Related papers (2025-01-11T17:52:22Z)
Multimodal Graph Constrastive Learning and Prompt for ChartQA [11.828192162922436]
ChartQA presents significant challenges due to the complex distribution of chart elements and the implicit patterns embedded within the underlying data.<n>We have developed a joint multimodal scene graph for charts, explicitly representing the relationships between chart elements and their associated patterns.
arXiv Detail & Related papers (2025-01-08T06:27:07Z)
ChartAdapter: Large Vision-Language Model for Chart Summarization [13.499376163294816]
ChartAdapter is a lightweight transformer module designed to bridge the gap between charts and textual summaries.<n>By integrating ChartAdapter with an LLM, we enable end-to-end training and efficient chart summarization.
arXiv Detail & Related papers (2024-12-30T05:07:34Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts. We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z)
ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z)
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning [62.51232333352754]
Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. We present DiagrammerGPT, a novel two-stage text-to-diagram generation framework. We show that our framework produces more accurate diagrams, outperforming existing T2I models.
arXiv Detail & Related papers (2023-10-18T17:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.