SVGen: Interpretable Vector Graphics Generation with Large Language Models
- URL: http://arxiv.org/abs/2508.09168v1
- Date: Wed, 06 Aug 2025 15:00:24 GMT
- Title: SVGen: Interpretable Vector Graphics Generation with Large Language Models
- Authors: Feiyu Wang, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao, Hao Sun, Xuelong Li,
- Abstract summary: We introduce SVG-1M, a large-scale dataset of high-quality SVGs paired with natural language descriptions.<n>We create well-aligned Text to SVG training pairs, including a subset with Chain of Thought annotations for enhanced semantic guidance.<n>Based on this dataset, we propose SVGen, an end-to-end model that generates SVG code from natural language inputs.
- Score: 61.62816031675714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scalable Vector Graphics (SVG) is widely used in front-end development and UI/UX design due to its scalability, editability, and rendering efficiency. However, turning creative ideas into precise vector graphics remains a time-consuming challenge. To address this, we introduce SVG-1M, a large-scale dataset of high-quality SVGs paired with natural language descriptions. Through advanced data augmentation and annotation, we create well-aligned Text to SVG training pairs, including a subset with Chain of Thought annotations for enhanced semantic guidance. Based on this dataset, we propose SVGen, an end-to-end model that generates SVG code from natural language inputs. Our approach ensures semantic accuracy and structural completeness, supported by curriculum learning and reinforcement learning optimization. Experiments show that SVGen outperforms general large models and traditional rendering methods in both effectiveness and efficiency. Code, model, and dataset are available on GitHub.
Related papers
- DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance [48.98604326855894]
We introduce DuetSVG, a unified multimodal model that jointly generates image tokens and corresponding SVG tokens in an end-to-end manner.<n>At inference, we apply a novel test-time scaling strategy that leverages the model's native visual predictions as guidance to improve SVG decoding quality.
arXiv Detail & Related papers (2025-12-11T18:23:03Z) - InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models [65.49118879021016]
We present the InternSVG family, an integrated data-benchmark-model suite.<n>At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks.<n>We propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens.
arXiv Detail & Related papers (2025-10-13T12:38:04Z) - SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation [47.390332111383294]
We present SVGThinker, a reasoning-driven framework that aligns the production of SVG code with the visualization process.<n>Our pipeline first renders each primitive in sequence and uses a multimodal model to annotate the image and code.<n> Experiments against state-of-the-art baselines show that SVGThinker produces more stable, editable, and higher-quality SVGs.
arXiv Detail & Related papers (2025-09-29T05:25:00Z) - UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models [9.310212949500011]
We propose an SVG-centric dataset called UniSVG, comprising 525k data items, tailored for MLLM training and evaluation.<n>UniSVG is the first comprehensive dataset designed for unified SVG generation (from textual prompts and images) and SVG understanding (color, category, usage, etc.)<n>As expected, learning on the proposed dataset boosts open-source MLLMs' performance on various SVG U&G tasks, surpassing SOTA close-source MLLMs like GPT-4V.
arXiv Detail & Related papers (2025-08-11T08:50:14Z) - Rendering-Aware Reinforcement Learning for Vector Graphics Generation [15.547843461605746]
We introduce RLRF(Reinforcement Learning from Rendering Feedback), an RL method that enhances SVG generation in vision-language models (VLMs)<n>Given an input image, the model generates SVG roll-outs that are rendered and compared to the original image to compute a reward.<n>This visual fidelity feedback guides the model toward producing more accurate, efficient, and semantically coherent SVGs.
arXiv Detail & Related papers (2025-05-27T06:56:00Z) - OmniSVG: A Unified Scalable Vector Graphics Generation Model [69.59073636922287]
We propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models for end-to-end multimodal SVG generation.<n>By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the synthesis of complex SVG structure.<n>We introduce MMSVG-2M, a multimodal dataset with two million annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks.
arXiv Detail & Related papers (2025-04-08T17:59:49Z) - NeuralSVG: An Implicit Representation for Text-to-Vector Generation [54.4153300455889]
We propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts.<n>To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique.<n>We demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
arXiv Detail & Related papers (2025-01-07T18:50:06Z) - Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models [14.917583676464266]
Chat2SVG is a hybrid framework that combines Large Language Models and image diffusion models for text-to-SVG generation.
Our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users.
arXiv Detail & Related papers (2024-11-25T17:31:57Z) - SVG-Net: An SVG-based Trajectory Prediction Model [67.68864911674308]
Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems.
To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories.
Most of the proposed approaches represent the scene with averse averseized format and some of the more recent approaches leverage custom vectorized formats.
arXiv Detail & Related papers (2021-10-07T18:00:08Z) - DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation [217.86315551526235]
We propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and manipulation.
Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself.
We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool.
arXiv Detail & Related papers (2020-07-22T09:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.