Related papers: From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics

From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics

URL: http://arxiv.org/abs/2503.07429v1
Date: Mon, 10 Mar 2025 15:13:38 GMT
Title: From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics
Authors: Jaewook Lee, Jeongah Lee, Wanyong Feng, Andrew Lan,
Abstract summary: Large language models (LLMs) offer new possibilities for enhancing math education by automating support for both teachers and students.<n>Recent research on using LLMs to generate Scalable Vector Graphics (SVG) presents a promising approach to automating diagram creation.<n>This paper addresses three research questions: (1) how to automatically generate math diagrams in problem-solving hints and evaluate their quality, (2) whether SVG is an effective intermediate representation for math diagrams, and (3) what prompting strategies and formats are required for LLMs to generate accurate SVG-based diagrams.
Score: 4.012351415340318
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Advances in large language models (LLMs) offer new possibilities for enhancing math education by automating support for both teachers and students. While prior work has focused on generating math problems and high-quality distractors, the role of visualization in math learning remains under-explored. Diagrams are essential for mathematical thinking and problem-solving, yet manually creating them is time-consuming and requires domain-specific expertise, limiting scalability. Recent research on using LLMs to generate Scalable Vector Graphics (SVG) presents a promising approach to automating diagram creation. Unlike pixel-based images, SVGs represent geometric figures using XML, allowing seamless scaling and adaptability. Educational platforms such as Khan Academy and IXL already use SVGs to display math problems and hints. In this paper, we explore the use of LLMs to generate math-related diagrams that accompany textual hints via intermediate SVG representations. We address three research questions: (1) how to automatically generate math diagrams in problem-solving hints and evaluate their quality, (2) whether SVG is an effective intermediate representation for math diagrams, and (3) what prompting strategies and formats are required for LLMs to generate accurate SVG-based diagrams. Our contributions include defining the task of automatically generating SVG-based diagrams for math hints, developing an LLM prompting-based pipeline, and identifying key strategies for improving diagram generation. Additionally, we introduce a Visual Question Answering-based evaluation setup and conduct ablation studies to assess different pipeline variations. By automating the math diagram creation, we aim to provide students and teachers with accurate, conceptually relevant visual aids that enhance problem-solving and learning experiences.

Related papers

Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review [0.0]
Traditional vectorization techniques suffer from long processing times and excessive output complexity. The advent of large language models (LLMs) has opened new possibilities for the generation, editing, and analysis of vector graphics.
arXiv Detail & Related papers (2025-03-06T21:23:17Z)
NeuralSVG: An Implicit Representation for Text-to-Vector Generation [54.4153300455889]
We propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts. To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique. We demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
arXiv Detail & Related papers (2025-01-07T18:50:06Z)
Empowering LLMs to Understand and Generate Complex Vector Graphics [30.21003939248769]
Large Language Models (LLMs) encode partial knowledge of vector graphics from web pages during training. Recent findings suggest that semantically ambiguous and tokenized representations within LLMs may result in hallucinations in vector primitive predictions. We present LLM4SVG, an initial yet substantial step toward bridging this gap by enabling LLMs to better understand and generate vector graphics.
arXiv Detail & Related papers (2024-12-15T07:49:31Z)
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine [85.80851893886161]
We propose MAVIS, a MAthematical VISual instruction tuning pipeline for MLLMs, featuring an automatic data engine to efficiently create mathematical visual datasets. We use MAVIS-Caption to fine-tune a math-specific vision encoder (CLIP-Math) through contrastive learning, tailored for improved diagram visual encoding. Third, we adopt MAVIS-Instruct to perform the instruction tuning for robust problem-solving skills, and term the resulting model as MAVIS-7B.
arXiv Detail & Related papers (2024-07-11T17:59:47Z)
Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs) In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z)
Visually Descriptive Language Model for Vector Graphics Reasoning [76.42082386029206]
We propose the Visually Descriptive Language Model (VDLM) to bridge the gap between low-level visual perception and high-level language reasoning. We show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
arXiv Detail & Related papers (2024-04-09T17:30:18Z)
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? [99.0305256706604]
We introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs. We meticulously collect 2,612 high-quality, multi-subject math problems with diagrams from publicly available sources. This approach allows MathVerse to comprehensively assess whether and how much MLLMs can truly understand the visual diagrams for mathematical reasoning.
arXiv Detail & Related papers (2024-03-21T17:59:50Z)
Talk like a Graph: Encoding Graphs for Large Language Models [15.652881653332194]
We study the first comprehensive study of encoding graph-structured data as text for consumption by large language models (LLMs) We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered.
arXiv Detail & Related papers (2023-10-06T19:55:21Z)
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding [46.042197741423365]
Large language models (LLMs) have made significant advancements in natural language understanding. This work investigates if it is possible for the LLM to understand images as well.
arXiv Detail & Related papers (2023-06-09T17:57:01Z)
Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures. We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z)
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT [10.879701971582502]
We aim to develop a large language model (LLM) with the reasoning ability on complex graph data. Inspired by the latest ChatGPT and Toolformer models, we propose the Graph-ToolFormer framework to teach LLMs themselves with prompts augmented by ChatGPT to use external graph reasoning API tools.
arXiv Detail & Related papers (2023-04-10T05:25:54Z)
DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation [217.86315551526235]
We propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and manipulation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool.
arXiv Detail & Related papers (2020-07-22T09:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.