VisCoder2: Building Multi-Language Visualization Coding Agents
- URL: http://arxiv.org/abs/2510.23642v1
- Date: Fri, 24 Oct 2025 18:03:57 GMT
- Title: VisCoder2: Building Multi-Language Visualization Coding Agents
- Authors: Yuansheng Ni, Songcheng Cai, Xiangchao Chen, Jiarong Liang, Zhiheng Lyu, Jiaqi Deng, Kai Zou, Ping Nie, Fei Yuan, Xiang Yue, Wenhu Chen,
- Abstract summary: We introduce three complementary resources for advancing visualization coding agents.<n>VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models.
- Score: 63.63232038173407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug. Finally, we present VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, with further gains from iterative self-debug, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages.
Related papers
- Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation [22.50538010082899]
We present an automated dataset generation pipeline featuring a dual-LLM Questioner-r design.<n>We show this data enables a 7B open-weight model to significantly outperform larger proprietary systems on key metrics like compilation success.
arXiv Detail & Related papers (2025-11-29T05:26:53Z) - Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation [4.45354703148321]
Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance.<n>We present the first comprehensive evaluation of five state-of-the-art large language models on assembly-to-source translation.
arXiv Detail & Related papers (2025-11-28T12:40:30Z) - SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models [59.90381306452982]
evaluating large language models (LLMs) for software engineering has been limited by narrow task coverage, language bias, and insufficient alignment with real-world developer.<n>We introduce SWE-1, a comprehensive benchmark that unifies heterogeneous code-related evaluations into a structured and production-aligned framework.<n>SWE- spans 8 task types, 8 programming scenarios, and 10 programming languages, with 2000 high-quality instances curated from authentic GitHub pull requests.
arXiv Detail & Related papers (2025-11-07T18:01:32Z) - Beyond Language Barriers: Multi-Agent Coordination for Multi-Language Code Generation [8.896718697354187]
XL-CoGen produces high-quality code across multiple programming languages.<n>It integrates intermediate representation, code generation, translation, and automated repair.
arXiv Detail & Related papers (2025-09-24T09:18:08Z) - VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models [82.05514464090172]
Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding.<n>However, their ability to generate code from multimodal inputs remains limited.<n>We introduce VisCodex, a unified framework that seamlessly merges vision and coding language models.
arXiv Detail & Related papers (2025-08-13T17:00:44Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - Teaching a Language Model to Speak the Language of Tools [0.0]
This work presents a methodology for adapting existing language models to enable robust tool use in any target language.<n>The research introduces TUCAN, which achieves up to 28.75% improvement in function-calling accuracy over base models.
arXiv Detail & Related papers (2025-06-29T20:47:27Z) - VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation [69.35779796364413]
We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction.<n>It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback.
arXiv Detail & Related papers (2025-06-04T13:24:44Z) - Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following [51.18383180774354]
We introduce Multi-IF, a new benchmark designed to assess Large Language Models' proficiency in following multi-turn and multilingual instructions.
Our evaluation of 14 state-of-the-art LLMs on Multi-IF reveals that it presents a significantly more challenging task than existing benchmarks.
languages with non-Latin scripts (Hindi, Russian, and Chinese) generally exhibit higher error rates, suggesting potential limitations in the models' multilingual capabilities.
arXiv Detail & Related papers (2024-10-21T00:59:47Z) - The Struggles of LLMs in Cross-lingual Code Clone Detection [3.5202378300682162]
Cross-lingual code clone detection has gained traction within the software engineering community.<n>Inspired by the significant advances in machine learning, this paper revisits cross-lingual code clone detection.<n>We evaluate the performance of five (05) Large Language Models (LLMs) and eight prompts (08) for the identification of cross-lingual code clones.
arXiv Detail & Related papers (2024-08-08T12:57:14Z) - Output Format Biases in the Evaluation of Large Language Models for Code Translation [6.75681623173699]
It is crucial to understand and address variations in output format.<n>Non-code elements can interfere with evaluation metrics, resulting in biased assessments of model performance and comparisons.<n>We propose a strategic combination of prompt engineering and regular expressions that effectively extracts source code from mixed-format outputs.
arXiv Detail & Related papers (2024-03-25T21:41:31Z) - What Is Missing in Multilingual Visual Reasoning and How to Fix It [57.37123046817781]
We evaluate NLP models' multilingual, multimodal capabilities by testing on a visual reasoning task.<n> proprietary systems like GPT-4V obtain the best performance on this task now, but open models lag in comparison.<n>Our interventions achieve the best open performance on this task in a zero-shot setting, boosting open models LLaVA-v1.5-13B by 13.4%, LLaVA-v1.6-34B by 20.3%, and Qwen-VL by 16.7%.
arXiv Detail & Related papers (2024-03-03T05:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.