Related papers: SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion

SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion

URL: http://arxiv.org/abs/2412.10437v2
Date: Sun, 23 Mar 2025 16:20:45 GMT
Title: SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
Authors: Ximing Xing, Juncheng Hu, Jing Zhang, Dong Xu, Qian Yu,
Abstract summary: We introduce SVGFusion, a Text-to-SVG model capable of scaling to real-world SVG data.<n>The core idea of SVGFusion is to utilize a popular Text-to-Image framework to learn a continuous latent space for vector graphics.<n>To effectively train and evaluate SVGFusion, we construct SVGX-Dataset, a large-scale, high-quality SVG dataset.
Score: 32.01103570298614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we introduce SVGFusion, a Text-to-SVG model capable of scaling to real-world SVG data without relying on text-based discrete language models or prolonged Score Distillation Sampling (SDS) optimization. The core idea of SVGFusion is to utilize a popular Text-to-Image framework to learn a continuous latent space for vector graphics. Specifically, SVGFusion comprises two key modules: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) and a Vector Space Diffusion Transformer (VS-DiT). The VP-VAE processes both SVG codes and their corresponding rasterizations to learn a continuous latent space, while the VS-DiT generates latent codes within this space based on the input text prompt. Building on the VP-VAE, we propose a novel rendering sequence modeling strategy which enables the learned latent space to capture the inherent creation logic of SVGs. This allows the model to generate SVGs with higher visual quality and more logical construction, while systematically avoiding occlusion in complex graphic compositions. Additionally, the scalability of SVGFusion can be continuously enhanced by adding more VS-DiT blocks. To effectively train and evaluate SVGFusion, we construct SVGX-Dataset, a large-scale, high-quality SVG dataset that addresses the scarcity of high-quality vector data. Extensive experiments demonstrate the superiority of SVGFusion over existing SVG generation methods, establishing a new framework for SVG content creation. Code, model, and data will be released at: https://ximinng.github.io/SVGFusionProject/

Related papers

OmniSVG: A Unified Scalable Vector Graphics Generation Model [70.26163703054979]
We propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the synthesis of complex SVG structure. We introduce MMSVG-2M, a multimodal dataset with two million annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks.
arXiv Detail & Related papers (2025-04-08T17:59:49Z)
NeuralSVG: An Implicit Representation for Text-to-Vector Generation [54.4153300455889]
We propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts. To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique. We demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
arXiv Detail & Related papers (2025-01-07T18:50:06Z)
SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers [5.921625661186367]
This paper introduces a component-based, autoregressive model for generating high-quality colored SVGs from textual input.<n>It significantly reduces computational overhead and improves efficiency compared to traditional methods.<n>To address the limitations of existing SVG datasets and support our research, we introduce ColorSVG-100K, the first large-scale dataset of colored SVGs.
arXiv Detail & Related papers (2024-12-13T15:24:11Z)
Visually Descriptive Language Model for Vector Graphics Reasoning [76.42082386029206]
We propose the Visually Descriptive Language Model (VDLM) to bridge the gap between low-level visual perception and high-level language reasoning. We show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
arXiv Detail & Related papers (2024-04-09T17:30:18Z)
SVGDreamer: Text Guided SVG Generation with Diffusion Model [31.76771064173087]
We propose a novel text-guided vector graphics synthesis method called SVGDreamer. SIVE process enables decomposition of synthesis into foreground objects and background. VPSD approach addresses issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence.
arXiv Detail & Related papers (2023-12-27T08:50:01Z)
StarVector: Generating Scalable Vector Graphics Code from Images and Text [15.32194071443065]
We introduce Star, a multimodal large language model for SVG generation.<n>It performs image vectorization by understanding image semantics and using SVG primitives for compact, precise outputs.<n>We train StarStack, a diverse dataset of 2M samples that enables generalization across vectorization tasks.
arXiv Detail & Related papers (2023-12-17T08:07:32Z)
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models [82.93345261434943]
We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics. Inspired by recent text-to-3D work, we learn an SVG consistent with a caption using Score Distillation Sampling. Experiments show greater quality than prior work, and demonstrate a range of styles including pixel art and sketches.
arXiv Detail & Related papers (2022-11-21T10:04:27Z)
Towards Layer-wise Image Vectorization [57.26058135389497]
We propose Layerwise Image Vectorization, namely LIVE, to convert images to SVGs and simultaneously maintain its image topology. Live generates compact forms with layer-wise structures that are semantically consistent with human perspective. Live initiates human editable SVGs for both designers and can be used in other applications.
arXiv Detail & Related papers (2022-06-09T17:55:02Z)
SVG-Net: An SVG-based Trajectory Prediction Model [67.68864911674308]
Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems. To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories. Most of the proposed approaches represent the scene with averse averseized format and some of the more recent approaches leverage custom vectorized formats.
arXiv Detail & Related papers (2021-10-07T18:00:08Z)
DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation [217.86315551526235]
We propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and manipulation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool.
arXiv Detail & Related papers (2020-07-22T09:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.