Are LLMs ready to help non-expert users to make charts of official statistics data?
- URL: http://arxiv.org/abs/2510.01197v1
- Date: Wed, 03 Sep 2025 08:11:53 GMT
- Title: Are LLMs ready to help non-expert users to make charts of official statistics data?
- Authors: Gadir Suleymanli, Alexander Rogiers, Lucas Lageweg, Jefrey Lijffijt,
- Abstract summary: We ask the question "Are current Generative AI models capable of facilitating the identification of the right data and the fully-automatic creation of charts to provide information in visual form?"<n>We present a structured evaluation of recent large language models' capabilities to generate charts from complex data in response to user queries.<n>Results indicate that locating and processing the correct data represents the most significant challenge.
- Score: 39.88557897908524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this time when biased information, deep fakes, and propaganda proliferate, the accessibility of reliable data sources is more important than ever. National statistical institutes provide curated data that contain quantitative information on a wide range of topics. However, that information is typically spread across many tables and the plain numbers may be arduous to process. Hence, this open data may be practically inaccessible. We ask the question "Are current Generative AI models capable of facilitating the identification of the right data and the fully-automatic creation of charts to provide information in visual form, corresponding to user queries?". We present a structured evaluation of recent large language models' (LLMs) capabilities to generate charts from complex data in response to user queries. Working with diverse public data from Statistics Netherlands, we assessed multiple LLMs on their ability to identify relevant data tables, perform necessary manipulations, and generate appropriate visualizations autonomously. We propose a new evaluation framework spanning three dimensions: data retrieval & pre-processing, code quality, and visual representation. Results indicate that locating and processing the correct data represents the most significant challenge. Additionally, LLMs rarely implement visualization best practices without explicit guidance. When supplemented with information about effective chart design, models showed marked improvement in representation scores. Furthermore, an agentic approach with iterative self-evaluation led to excellent performance across all evaluation dimensions. These findings suggest that LLMs' effectiveness for automated chart generation can be enhanced through appropriate scaffolding and feedback mechanisms, and that systems can already reach the necessary accuracy across the three evaluation dimensions.
Related papers
- Beyond Description: A Multimodal Agent Framework for Insightful Chart Summarization [18.33134893463544]
We propose a plan-and-execute multi-agent framework to uncover profound insights directly from chart images.<n>To overcome the lack of suitable benchmarks, we introduce ChartSummInsights, a new dataset featuring a diverse collection of real-world charts.
arXiv Detail & Related papers (2026-02-21T06:17:37Z) - Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z) - Automated Visualization Makeovers with LLMs [0.716879432974126]
Visualisation makeovers are exercises where the community exchange feedback to improve charts and data visualizations.<n>Can multi-modal large language models (LLMs) emulate this task?<n>Our system is centred around prompt engineering of a pre-trained model, relying on a combination of user guidelines and any latent knowledge of data visualization practices.
arXiv Detail & Related papers (2025-07-21T11:51:20Z) - Protecting multimodal large language models against misleading visualizations [94.71976205962527]
We show that questionanswering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline.<n>We introduce the first inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones.<n>We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points.
arXiv Detail & Related papers (2025-02-27T20:22:34Z) - Better Think with Tables: Tabular Structures Enhance LLM Comprehension for Data-Analytics Requests [33.471112091886894]
Large Language Models (LLMs) often struggle with data-analytics requests related to information retrieval and data manipulation.<n>We introduce Thinking with Tables, where we inject tabular structures into LLMs for data-analytics requests.<n>We show that providing tables yields a 40.29 percent average performance gain along with better manipulation and token efficiency.
arXiv Detail & Related papers (2024-12-22T23:31:03Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - Can LLMs Generate Visualizations with Dataless Prompts? [17.280610067626135]
We investigate the ability of large language models to provide accurate data and relevant visualizations in response to such queries.
Specifically, we investigate the ability of GPT-3 and GPT-4 to generate visualizations with dataless prompts, where no data accompanies the query.
arXiv Detail & Related papers (2024-06-22T22:59:09Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
TACT contains challenging instructions that demand stitching information scattered across one or more texts.
We construct this dataset by leveraging an existing dataset of texts and their associated tables.
We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.