See it. Say it. Sorted: Agentic System for Compositional Diagram Generation
- URL: http://arxiv.org/abs/2508.15222v1
- Date: Thu, 21 Aug 2025 04:20:36 GMT
- Title: See it. Say it. Sorted: Agentic System for Compositional Diagram Generation
- Authors: Hantao Zhang, Jingyang Liu, Ed Li,
- Abstract summary: We study sketch-to-diagram generation: converting rough hand sketches into precise, compositional diagrams.<n>We introduce See it. Say it. Sorted., a training-free agentic system that couples a Vision-Language Model (VLM) with Large Language Models (LLMs)<n>The system runs an iterative loop in which a Critic VLM proposes a small set of qualitative, edits; multiple candidate LLMs synthesize updates with diverse strategies.<n>This design prioritizes qualitative reasoning over brittle numerical estimates, preserves global constraints (e.g., alignment, connectivity), and naturally supports human-in-the-loop
- Score: 0.5079602839359522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study sketch-to-diagram generation: converting rough hand sketches into precise, compositional diagrams. Diffusion models excel at photorealism but struggle with the spatial precision, alignment, and symbolic structure required for flowcharts. We introduce See it. Say it. Sorted., a training-free agentic system that couples a Vision-Language Model (VLM) with Large Language Models (LLMs) to produce editable Scalable Vector Graphics (SVG) programs. The system runs an iterative loop in which a Critic VLM proposes a small set of qualitative, relational edits; multiple candidate LLMs synthesize SVG updates with diverse strategies (conservative->aggressive, alternative, focused); and a Judge VLM selects the best candidate, ensuring stable improvement. This design prioritizes qualitative reasoning over brittle numerical estimates, preserves global constraints (e.g., alignment, connectivity), and naturally supports human-in-the-loop corrections. On 10 sketches derived from flowcharts in published papers, our method more faithfully reconstructs layout and structure than two frontier closed-source image generation LLMs (GPT-5 and Gemini-2.5-Pro), accurately composing primitives (e.g., multi-headed arrows) without inserting unwanted text. Because outputs are programmatic SVGs, the approach is readily extensible to presentation tools (e.g., PowerPoint) via APIs and can be specialized with improved prompts and task-specific tools. The codebase is open-sourced at https://github.com/hantaoZhangrichard/see_it_say_it_sorted.git.
Related papers
- VecGlypher: Unified Vector Glyph Generation with Language Models [49.18215716168074]
VecGlypher generates high-fidelity vector glyphs directly from text descriptions or image exemplars.<n>VecGlypher autoregressively emits SVG path tokens, avoiding intermediates and a target character.
arXiv Detail & Related papers (2026-02-25T00:27:23Z) - Semi-supervised Instruction Tuning for Large Language Models on Text-Attributed Graphs [62.544129365882014]
We propose a novel Semi-supervised Instruction Tuning pipeline for Graph Learning, named SIT-Graph.<n> SIT-Graph is model-agnostic and can be seamlessly integrated into any graph instruction tuning method that utilizes LLMs as the predictor.<n>Extensive experiments demonstrate that when incorporated into state-of-the-art graph instruction tuning methods, SIT-Graph significantly enhances their performance on text-attributed graph benchmarks.
arXiv Detail & Related papers (2026-01-19T08:10:53Z) - SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation [47.390332111383294]
We present SVGThinker, a reasoning-driven framework that aligns the production of SVG code with the visualization process.<n>Our pipeline first renders each primitive in sequence and uses a multimodal model to annotate the image and code.<n> Experiments against state-of-the-art baselines show that SVGThinker produces more stable, editable, and higher-quality SVGs.
arXiv Detail & Related papers (2025-09-29T05:25:00Z) - SVGen: Interpretable Vector Graphics Generation with Large Language Models [61.62816031675714]
We introduce SVG-1M, a large-scale dataset of high-quality SVGs paired with natural language descriptions.<n>We create well-aligned Text to SVG training pairs, including a subset with Chain of Thought annotations for enhanced semantic guidance.<n>Based on this dataset, we propose SVGen, an end-to-end model that generates SVG code from natural language inputs.
arXiv Detail & Related papers (2025-08-06T15:00:24Z) - SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches [54.06877048295693]
We introduce SketchAgent, a system designed to automate the transformation of hand-drawn sketches into structured diagrams.<n>SketchAgent integrates sketch recognition, symbolic reasoning, and iterative validation to produce semantically coherent and structurally accurate diagrams.<n>By streamlining the diagram generation process, SketchAgent holds great promise for applications in design, education, and engineering.
arXiv Detail & Related papers (2025-08-02T07:22:51Z) - Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [75.9865035064794]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z) - From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics [4.012351415340318]
Large language models (LLMs) offer new possibilities for enhancing math education by automating support for both teachers and students.<n>Recent research on using LLMs to generate Scalable Vector Graphics (SVG) presents a promising approach to automating diagram creation.<n>This paper addresses three research questions: (1) how to automatically generate math diagrams in problem-solving hints and evaluate their quality, (2) whether SVG is an effective intermediate representation for math diagrams, and (3) what prompting strategies and formats are required for LLMs to generate accurate SVG-based diagrams.
arXiv Detail & Related papers (2025-03-10T15:13:38Z) - Visually Descriptive Language Model for Vector Graphics Reasoning [76.42082386029206]
We propose the Visually Descriptive Language Model (VDLM) to bridge the gap between low-level visual perception and high-level language reasoning.<n>We show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
arXiv Detail & Related papers (2024-04-09T17:30:18Z) - SVGDreamer: Text Guided SVG Generation with Diffusion Model [31.76771064173087]
We propose a novel text-guided vector graphics synthesis method called SVGDreamer.<n>SIVE process enables decomposition of synthesis into foreground objects and background.<n>VPSD approach addresses issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence.
arXiv Detail & Related papers (2023-12-27T08:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.