JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models
- URL: http://arxiv.org/abs/2602.04142v2
- Date: Thu, 05 Feb 2026 07:38:00 GMT
- Title: JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models
- Authors: Hiroshi Sasaki,
- Abstract summary: JSynFlow is a synthesised visual QA dataset for Japanese flowcharts.<n>This paper details the dataset's procedure and demonstrates that fine-tuning with JSynFlow significantly improves VLM performance.
- Score: 0.609170287691728
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision and language models (VLMs) are expected to analyse complex documents, such as those containing flowcharts, through a question-answering (QA) interface. The ability to recognise and interpret these flowcharts is in high demand, as they provide valuable insights unavailable in text-only explanations. However, developing VLMs with precise flowchart understanding requires large-scale datasets of flowchart images and corresponding text, the creation of which is highly time-consuming. To address this challenge, we introduce JSynFlow, a synthesised visual QA dataset for Japanese flowcharts, generated using large language models (LLMs). Our dataset comprises task descriptions for various business occupations, the corresponding flowchart images rendered from domain-specific language (DSL) code, and related QA pairs. This paper details the dataset's synthesis procedure and demonstrates that fine-tuning with JSynFlow significantly improves VLM performance on flowchart-based QA tasks. Our dataset is publicly available at https://huggingface.co/datasets/jri-advtechlab/jsynflow.
Related papers
- A Graph-based Approach for Multi-Modal Question Answering from Flowcharts in Telecom Documents [0.619840955350879]
Question-Answering from technical documents often involves questions whose answers are present in figures, such as flowcharts or flow diagrams.<n>We leverage graph representations of flowcharts obtained from Visual large Language Models (VLMs) and incorporate them in a text-based RAG system to show that this approach can enable image retrieval for QA in the telecom domain.
arXiv Detail & Related papers (2025-07-25T07:36:13Z) - Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents [106.04963073116468]
Flowcharts are a critical tool for visualizing decision-making processes.<n> vision-language models frequently hallucinate nonexistent connections and decision paths when analyzing these diagrams.<n>We introduce Fine-grained Flowchart, which traces specific components grounding a flowchart referring LLM response.<n>We propose FlowPathAgent, a neurosymbolic agent that performs fine-grained post hoc attribution through graph-based reasoning.
arXiv Detail & Related papers (2025-06-02T06:02:41Z) - BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks [12.683482535955314]
LLM performance suffers when graphs are represented as sequential text.<n>We introduce BRIDGES, a framework designed to incorporate graph modality into LLMs for EDA tasks.<n>Results demonstrate 2x to 10x improvements across multiple tasks compared to text-only baselines.
arXiv Detail & Related papers (2025-04-07T15:27:32Z) - RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z) - Distill Visual Chart Reasoning Ability from LLMs to MLLMs [64.32993770646165]
Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs)<n>We propose Code-as-Intermediary Translation (CIT), a cost-effective, efficient and scalable data synthesis method for distilling visual reasoning abilities from LLMs to MLLMs.<n>ReachQA is a dataset containing 3k reasoning-intensive charts and 20k Q&A pairs to enhance both recognition and reasoning abilities of MLLMs.
arXiv Detail & Related papers (2024-10-24T14:50:42Z) - FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding [52.35520385083425]
FlowLearn dataset is a resource tailored to enhance the understanding of flowcharts.
The scientific subset contains 3,858 flowcharts sourced from scientific literature.
The simulated subset contains 10,000 flowcharts created using a customizable script.
arXiv Detail & Related papers (2024-07-06T20:58:51Z) - Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator.
Key innovation in our method lies in the Synthesize Step-by-Step strategy.
We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Task-Oriented Dialogue as Dataflow Synthesis [158.77123205487334]
We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph.
A dialogue agent maps each user utterance to a program that extends this graph.
We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people.
arXiv Detail & Related papers (2020-09-24T00:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.