Related papers: Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code

Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code

URL: http://arxiv.org/abs/2512.02170v2
Date: Wed, 03 Dec 2025 11:47:04 GMT
Title: Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code
Authors: Pritam Deka, Barry Devereux,
Abstract summary: We present Flowchart2Mermaid, a lightweight web system that converts flowchart images into editable Mermaidjs code.<n>The interface supports mixed-initiative refinement through inline text editing, drag-and-drop node insertion, and natural-language commands interpreted by an integrated AI assistant.
Score: 0.3007949058551534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Flowcharts are common tools for communicating processes but are often shared as static images that cannot be easily edited or reused. We present Flowchart2Mermaid, a lightweight web system that converts flowchart images into editable Mermaid.js code which is a markup language for visual workflows, using a detailed system prompt and vision-language models. The interface supports mixed-initiative refinement through inline text editing, drag-and-drop node insertion, and natural-language commands interpreted by an integrated AI assistant. Unlike prior image-to-diagram tools, our approach produces a structured, version-controllable textual representation that remains synchronized with the rendered diagram. We further introduce evaluation metrics to assess structural accuracy, flow correctness, syntax validity, and completeness across multiple models.

Related papers

ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing [64.65742943745866]
ChartE$3$ is an End-to-End Chart Editing benchmark.<n>It directly evaluates models without relying on intermediate natural language programs or code-level supervision.<n>It contains over 1,200 high-quality samples constructed via a well-designed data pipeline with human curation.
arXiv Detail & Related papers (2026-01-29T13:29:27Z)
Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration [64.12127577975696]
Zero-shot composed image retrieval (ZS-CIR) is a rapidly growing area with significant practical applications.<n>Existing ZS-CIR methods often struggle to capture fine-grained changes and integrate visual and semantic information effectively.<n>We propose a novel Fine-Grained Zero-Shot Composed Image Retrieval method with Complementary Visual-Semantic Integration.
arXiv Detail & Related papers (2026-01-20T15:17:14Z)
Charts Are Not Images: On the Challenges of Scientific Chart Editing [66.38730113476677]
textitFigEdit is a benchmark for scientific figure editing comprising over 30,000 samples.<n>Our benchmark demonstrates the profound limitations of pixel-level manipulation.<n>By releasing textitFigEdit, we aim to enable systematic progress in structure-aware figure editing.
arXiv Detail & Related papers (2025-11-30T06:13:48Z)
Visual Semantic Description Generation with MLLMs for Image-Text Matching [7.246705430021142]
We propose a novel framework that bridges the modality gap by leveraging multimodal large language models (MLLMs) as visual semantics.<n>Our approach combines: (1) Instance-level alignment by fusing visual features with VSD to enhance the linguistic expressiveness of image representations, and (2) Prototype-level alignment through VSD clustering to ensure category-level consistency.
arXiv Detail & Related papers (2025-07-11T13:38:01Z)
Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction [13.728393452963942]
multimodal large language models (MLLMs) have attracted increasing research attention due to their powerful visual understanding capabilities.<n>This paper proposes ChartIR, an iterative refinement method based on structured instruction.<n> Experimental results show that, compared to other method, our method achieves superior performance on both the open-source model Qwen2-VL and the closed-source model GPT-4o.
arXiv Detail & Related papers (2025-06-15T14:10:16Z)
PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents [47.79080056618323]
PlotEdit is a novel multi-agent framework for natural language-driven end-to-end chart image editing.<n>PlotEdit orchestrates five LLM agents: Chart2Table for data table extraction, Chart2Vision for style identification, Chart2Code for retrieving rendering code, Instruction Decomposition Agent for parsing user requests into executable steps, and Multimodal Editing Agent for implementing nuanced chart component modifications.<n>PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits.
arXiv Detail & Related papers (2025-01-20T02:31:52Z)
Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding [9.267156820352996]
Flowcharts are typically presented as images, driving the trend of using vision-language models (VLMs) for end-to-end flowchart understanding.<n>Two key challenges arise: (i) Limited controllability--users have minimal influence over the downstream task, while the training of VLMs is often out of reach.<n>We propose TextFlow, addressing aforementioned issues with two stages: Vision Textualizer and Textual Reasoner.
arXiv Detail & Related papers (2024-12-21T00:52:41Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
Let the Chart Spark: Embedding Semantic Context into Chart with Text-to-Image Generative Model [7.587729429265939]
Pictorial visualization seamlessly integrates data and semantic context into visual representation. We propose ChartSpark, a novel system that embeds semantic context into chart based on text-to-image generative model. We develop an interactive visual interface that integrates a text analyzer, editing module, and evaluation module to enable users to generate, modify, and assess pictorial visualizations.
arXiv Detail & Related papers (2023-04-28T05:18:30Z)
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z)
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation [97.36550187238177]
We study a novel task on text-guided image manipulation on the entity level in the real world. The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the text-irrelevant regions, and (3) to merge the manipulated entity into the image naturally. Our framework incorporates a semantic alignment module to locate the image regions to be manipulated, and a semantic loss to help align the relationship between the vision and language.
arXiv Detail & Related papers (2022-04-09T09:01:19Z)
Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model. During the training phase, the modality transition network is optimised by the proposed modality loss. Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.