VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
- URL: http://arxiv.org/abs/2506.03930v2
- Date: Mon, 29 Sep 2025 00:45:05 GMT
- Title: VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
- Authors: Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen,
- Abstract summary: We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction.<n>It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback.
- Score: 69.35779796364413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.
Related papers
- VisCoder2: Building Multi-Language Visualization Coding Agents [63.63232038173407]
We introduce three complementary resources for advancing visualization coding agents.<n>VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models.
arXiv Detail & Related papers (2025-10-24T18:03:57Z) - RECODE: Reasoning Through Code Generation for Visual Question Answering [68.86938437188964]
We propose to leverage derendering -- the process of reverse-engineering visuals into executable code -- as a new modality for verifiable visual reasoning.<n>Our work demonstrates that grounding visual perception in executable code provides a new path toward more accurate and verifiable multimodal reasoning.
arXiv Detail & Related papers (2025-10-15T17:05:37Z) - Enhancing Neural Code Representation with Additional Context [19.42697747205407]
Recent deep learning models typically rely on source code alone, overlooking contextual information such as version history or structural relationships.<n>We conduct an empirical study on how enriching code representations with such contextual signals affects neural model performance.<n>Five representative models (CodeBERT, GraphCodeBERT, CodeT5, PLBART, ASTNN) are fine-tuned under code-only and context-augmented settings.
arXiv Detail & Related papers (2025-10-14T02:45:42Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - LLM Code Customization with Visual Results: A Benchmark on TikZ [6.3303908500560615]
We introduce vTikZ, the first benchmark to evaluate the ability of Large Language Models to customize code while preserving coherent visual outcomes.<n>Our benchmark consists of carefully curated vTikZ editing scenarios, parameterized ground truths, and a reviewing tool that leverages visual feedback to assess correctness.
arXiv Detail & Related papers (2025-05-07T08:26:54Z) - Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement [16.22363384653305]
Multimodal Large Language Models (MLLMs) perform fine-grained visual parsing, precise code synthesis, and robust cross-modal reasoning.<n>We propose a dual preference-guided refinement framework that combines a feedback-driven, dual-modality reward mechanism with iterative preference learning.<n>Our framework significantly enhances the performance of general-purpose open-source MLLMs, enabling them to generate high-quality plotting code.
arXiv Detail & Related papers (2025-04-03T07:51:20Z) - ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation [62.88742217569754]
textbfChartCoder is the first dedicated chart-to-code MLLM.<n>We introduce textbfChart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation.<n> Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks.
arXiv Detail & Related papers (2025-01-11T17:52:22Z) - VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step.
V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy.
Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots [66.95139377783966]
We introduce Plot2Code, a comprehensive visual coding benchmark for Multi-modal Large Language Models.
We collect 132 manually selected high-quality matplotlib plots across six plot types from publicly available matplotlib galleries.
For each plot, we carefully offer its source code, and an descriptive instruction summarized by GPT-4.
arXiv Detail & Related papers (2024-05-13T17:59:22Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.