Related papers: JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models

JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models

URL: http://arxiv.org/abs/2602.04142v2
Date: Thu, 05 Feb 2026 07:38:00 GMT
Title: JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models
Authors: Hiroshi Sasaki,
Abstract summary: JSynFlow is a synthesised visual QA dataset for Japanese flowcharts.<n>This paper details the dataset's procedure and demonstrates that fine-tuning with JSynFlow significantly improves VLM performance.
Score: 0.609170287691728
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision and language models (VLMs) are expected to analyse complex documents, such as those containing flowcharts, through a question-answering (QA) interface. The ability to recognise and interpret these flowcharts is in high demand, as they provide valuable insights unavailable in text-only explanations. However, developing VLMs with precise flowchart understanding requires large-scale datasets of flowchart images and corresponding text, the creation of which is highly time-consuming. To address this challenge, we introduce JSynFlow, a synthesised visual QA dataset for Japanese flowcharts, generated using large language models (LLMs). Our dataset comprises task descriptions for various business occupations, the corresponding flowchart images rendered from domain-specific language (DSL) code, and related QA pairs. This paper details the dataset's synthesis procedure and demonstrates that fine-tuning with JSynFlow significantly improves VLM performance on flowchart-based QA tasks. Our dataset is publicly available at https://huggingface.co/datasets/jri-advtechlab/jsynflow.

Related papers

A Graph-based Approach for Multi-Modal Question Answering from Flowcharts in Telecom Documents [0.619840955350879]
Question-Answering from technical documents often involves questions whose answers are present in figures, such as flowcharts or flow diagrams.<n>We leverage graph representations of flowcharts obtained from Visual large Language Models (VLMs) and incorporate them in a text-based RAG system to show that this approach can enable image retrieval for QA in the telecom domain.
arXiv Detail & Related papers (2025-07-25T07:36:13Z)
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents [106.04963073116468]
Flowcharts are a critical tool for visualizing decision-making processes.<n> vision-language models frequently hallucinate nonexistent connections and decision paths when analyzing these diagrams.<n>We introduce Fine-grained Flowchart, which traces specific components grounding a flowchart referring LLM response.<n>We propose FlowPathAgent, a neurosymbolic agent that performs fine-grained post hoc attribution through graph-based reasoning.
arXiv Detail & Related papers (2025-06-02T06:02:41Z)
BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks [12.683482535955314]
LLM performance suffers when graphs are represented as sequential text.<n>We introduce BRIDGES, a framework designed to incorporate graph modality into LLMs for EDA tasks.<n>Results demonstrate 2x to 10x improvements across multiple tasks compared to text-only baselines.
arXiv Detail & Related papers (2025-04-07T15:27:32Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
Distill Visual Chart Reasoning Ability from LLMs to MLLMs [64.32993770646165]
Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs)<n>We propose Code-as-Intermediary Translation (CIT), a cost-effective, efficient and scalable data synthesis method for distilling visual reasoning abilities from LLMs to MLLMs.<n>ReachQA is a dataset containing 3k reasoning-intensive charts and 20k Q&A pairs to enhance both recognition and reasoning abilities of MLLMs.
arXiv Detail & Related papers (2024-10-24T14:50:42Z)
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding [52.35520385083425]
FlowLearn dataset is a resource tailored to enhance the understanding of flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature. The simulated subset contains 10,000 flowcharts created using a customizable script.
arXiv Detail & Related papers (2024-07-06T20:58:51Z)
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator. Key innovation in our method lies in the Synthesize Step-by-Step strategy. We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z)
Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language. We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs. We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z)
Task-Oriented Dialogue as Dataflow Synthesis [158.77123205487334]
We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people.
arXiv Detail & Related papers (2020-09-24T00:35:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.