Related papers: OmniSVG: A Unified Scalable Vector Graphics Generation Model

OmniSVG: A Unified Scalable Vector Graphics Generation Model

URL: http://arxiv.org/abs/2504.06263v2
Date: Mon, 26 May 2025 12:55:52 GMT
Title: OmniSVG: A Unified Scalable Vector Graphics Generation Model
Authors: Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Fukun Yin, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, Yu-Gang Jiang,
Abstract summary: We propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models for end-to-end multimodal SVG generation.<n>By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the synthesis of complex SVG structure.<n>We introduce MMSVG-2M, a multimodal dataset with two million annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks.
Score: 69.59073636922287
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scalable Vector Graphics (SVG) is an important image format widely adopted in graphic design because of their resolution independence and editability. The study of generating high-quality SVG has continuously drawn attention from both designers and researchers in the AIGC community. However, existing methods either produces unstructured outputs with huge computational cost or is limited to generating monochrome icons of over-simplified structures. To produce high-quality and complex SVG, we propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the expressiveness of complex SVG structure. To further advance the development of SVG synthesis, we introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks. Extensive experiments show that OmniSVG outperforms existing methods and demonstrates its potential for integration into professional SVG design workflows.

Related papers

DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance [48.98604326855894]
We introduce DuetSVG, a unified multimodal model that jointly generates image tokens and corresponding SVG tokens in an end-to-end manner.<n>At inference, we apply a novel test-time scaling strategy that leverages the model's native visual predictions as guidance to improve SVG decoding quality.
arXiv Detail & Related papers (2025-12-11T18:23:03Z)
RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance [32.59099674596894]
RoboSVG is a unified framework for generating interactive SVGs guided by textual, visual, and numerical signals.<n>To support this framework, we construct RoboDraw, a large-scale dataset of one million examples.<n>RoboSVG achieves superior query compliance and visual fidelity across tasks, establishing a new state of the art in versatile SVG generation.
arXiv Detail & Related papers (2025-10-26T13:57:08Z)
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models [65.49118879021016]
We present the InternSVG family, an integrated data-benchmark-model suite.<n>At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks.<n>We propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens.
arXiv Detail & Related papers (2025-10-13T12:38:04Z)
SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation [47.390332111383294]
We present SVGThinker, a reasoning-driven framework that aligns the production of SVG code with the visualization process.<n>Our pipeline first renders each primitive in sequence and uses a multimodal model to annotate the image and code.<n> Experiments against state-of-the-art baselines show that SVGThinker produces more stable, editable, and higher-quality SVGs.
arXiv Detail & Related papers (2025-09-29T05:25:00Z)
UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models [9.310212949500011]
We propose an SVG-centric dataset called UniSVG, comprising 525k data items, tailored for MLLM training and evaluation.<n>UniSVG is the first comprehensive dataset designed for unified SVG generation (from textual prompts and images) and SVG understanding (color, category, usage, etc.)<n>As expected, learning on the proposed dataset boosts open-source MLLMs' performance on various SVG U&G tasks, surpassing SOTA close-source MLLMs like GPT-4V.
arXiv Detail & Related papers (2025-08-11T08:50:14Z)
SVGen: Interpretable Vector Graphics Generation with Large Language Models [61.62816031675714]
We introduce SVG-1M, a large-scale dataset of high-quality SVGs paired with natural language descriptions.<n>We create well-aligned Text to SVG training pairs, including a subset with Chain of Thought annotations for enhanced semantic guidance.<n>Based on this dataset, we propose SVGen, an end-to-end model that generates SVG code from natural language inputs.
arXiv Detail & Related papers (2025-08-06T15:00:24Z)
Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation [29.418375886989992]
We introduce Reason-SVG, a framework designed to enhance Large Language Models (LLMs) reasoning for SVG generation.<n>Reason-SVG pioneers the "Drawing-with-Thought" (DwT) paradigm, in which models generate both SVG code and explicit design rationales.<n>We introduce the SVGX-DwT-10k dataset, a high-quality corpus of 10,000 SVG-DwT pairs, where each SVG code is generated based on explicit DwT reasoning.
arXiv Detail & Related papers (2025-05-30T11:57:58Z)
Rendering-Aware Reinforcement Learning for Vector Graphics Generation [15.547843461605746]
We introduce RLRF(Reinforcement Learning from Rendering Feedback), an RL method that enhances SVG generation in vision-language models (VLMs)<n>Given an input image, the model generates SVG roll-outs that are rendered and compared to the original image to compute a reward.<n>This visual fidelity feedback guides the model toward producing more accurate, efficient, and semantically coherent SVGs.
arXiv Detail & Related papers (2025-05-27T06:56:00Z)
NeuralSVG: An Implicit Representation for Text-to-Vector Generation [54.4153300455889]
We propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts.<n>To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique.<n>We demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
arXiv Detail & Related papers (2025-01-07T18:50:06Z)
SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers [5.921625661186367]
This paper introduces a component-based, autoregressive model for generating high-quality colored SVGs from textual input.<n>It significantly reduces computational overhead and improves efficiency compared to traditional methods.<n>To address the limitations of existing SVG datasets and support our research, we introduce ColorSVG-100K, the first large-scale dataset of colored SVGs.
arXiv Detail & Related papers (2024-12-13T15:24:11Z)
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion [32.01103570298614]
We introduce SVGFusion, a Text-to-SVG model capable of scaling to real-world SVG data. The core idea of SVGFusion is to utilize a popular Text-to-Image framework to learn a continuous latent space for vector graphics. To effectively train and evaluate SVGFusion, we construct SVGX-Dataset, a large-scale, high-quality SVG dataset.
arXiv Detail & Related papers (2024-12-11T09:02:25Z)
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models [14.917583676464266]
Chat2SVG is a hybrid framework that combines Large Language Models and image diffusion models for text-to-SVG generation. Our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users.
arXiv Detail & Related papers (2024-11-25T17:31:57Z)
Visually Descriptive Language Model for Vector Graphics Reasoning [76.42082386029206]
We propose the Visually Descriptive Language Model (VDLM) to bridge the gap between low-level visual perception and high-level language reasoning. We show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
arXiv Detail & Related papers (2024-04-09T17:30:18Z)
SVGDreamer: Text Guided SVG Generation with Diffusion Model [31.76771064173087]
We propose a novel text-guided vector graphics synthesis method called SVGDreamer. SIVE process enables decomposition of synthesis into foreground objects and background. VPSD approach addresses issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence.
arXiv Detail & Related papers (2023-12-27T08:50:01Z)
StarVector: Generating Scalable Vector Graphics Code from Images and Text [15.32194071443065]
We introduce Star, a multimodal large language model for SVG generation.<n>It performs image vectorization by understanding image semantics and using SVG primitives for compact, precise outputs.<n>We train StarStack, a diverse dataset of 2M samples that enables generalization across vectorization tasks.
arXiv Detail & Related papers (2023-12-17T08:07:32Z)
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models [82.93345261434943]
We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics. Inspired by recent text-to-3D work, we learn an SVG consistent with a caption using Score Distillation Sampling. Experiments show greater quality than prior work, and demonstrate a range of styles including pixel art and sketches.
arXiv Detail & Related papers (2022-11-21T10:04:27Z)
Towards Layer-wise Image Vectorization [57.26058135389497]
We propose Layerwise Image Vectorization, namely LIVE, to convert images to SVGs and simultaneously maintain its image topology. Live generates compact forms with layer-wise structures that are semantically consistent with human perspective. Live initiates human editable SVGs for both designers and can be used in other applications.
arXiv Detail & Related papers (2022-06-09T17:55:02Z)
SVG-Net: An SVG-based Trajectory Prediction Model [67.68864911674308]
Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems. To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories. Most of the proposed approaches represent the scene with averse averseized format and some of the more recent approaches leverage custom vectorized formats.
arXiv Detail & Related papers (2021-10-07T18:00:08Z)
DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation [217.86315551526235]
We propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and manipulation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool.
arXiv Detail & Related papers (2020-07-22T09:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.