Related papers: VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

URL: http://arxiv.org/abs/2506.03930v2
Date: Mon, 29 Sep 2025 00:45:05 GMT
Title: VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Authors: Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen,
Abstract summary: We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction.<n>It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback.
Score: 69.35779796364413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.

Related papers

VisCoder2: Building Multi-Language Visualization Coding Agents [63.63232038173407]
We introduce three complementary resources for advancing visualization coding agents.<n>VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models.
arXiv Detail & Related papers (2025-10-24T18:03:57Z)
RECODE: Reasoning Through Code Generation for Visual Question Answering [68.86938437188964]
We propose to leverage derendering -- the process of reverse-engineering visuals into executable code -- as a new modality for verifiable visual reasoning.<n>Our work demonstrates that grounding visual perception in executable code provides a new path toward more accurate and verifiable multimodal reasoning.
arXiv Detail & Related papers (2025-10-15T17:05:37Z)
Enhancing Neural Code Representation with Additional Context [19.42697747205407]
Recent deep learning models typically rely on source code alone, overlooking contextual information such as version history or structural relationships.<n>We conduct an empirical study on how enriching code representations with such contextual signals affects neural model performance.<n>Five representative models (CodeBERT, GraphCodeBERT, CodeT5, PLBART, ASTNN) are fine-tuned under code-only and context-augmented settings.
arXiv Detail & Related papers (2025-10-14T02:45:42Z)
IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
LLM Code Customization with Visual Results: A Benchmark on TikZ [6.3303908500560615]
We introduce vTikZ, the first benchmark to evaluate the ability of Large Language Models to customize code while preserving coherent visual outcomes.<n>Our benchmark consists of carefully curated vTikZ editing scenarios, parameterized ground truths, and a reviewing tool that leverages visual feedback to assess correctness.
arXiv Detail & Related papers (2025-05-07T08:26:54Z)
Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement [16.22363384653305]
Multimodal Large Language Models (MLLMs) perform fine-grained visual parsing, precise code synthesis, and robust cross-modal reasoning.<n>We propose a dual preference-guided refinement framework that combines a feedback-driven, dual-modality reward mechanism with iterative preference learning.<n>Our framework significantly enhances the performance of general-purpose open-source MLLMs, enabling them to generate high-quality plotting code.
arXiv Detail & Related papers (2025-04-03T07:51:20Z)
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation [62.88742217569754]
textbfChartCoder is the first dedicated chart-to-code MLLM.<n>We introduce textbfChart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation.<n> Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks.
arXiv Detail & Related papers (2025-01-11T17:52:22Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z)
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots [66.95139377783966]
We introduce Plot2Code, a comprehensive visual coding benchmark for Multi-modal Large Language Models. We collect 132 manually selected high-quality matplotlib plots across six plot types from publicly available matplotlib galleries. For each plot, we carefully offer its source code, and an descriptive instruction summarized by GPT-4.
arXiv Detail & Related papers (2024-05-13T17:59:22Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.