WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions
- URL: http://arxiv.org/abs/2602.21416v1
- Date: Tue, 24 Feb 2026 22:42:55 GMT
- Title: WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions
- Authors: Marco Terral, Haotian Zhang, Tianyang Zhang, Meng Lin, Xiaoqing Xie, Haoran Dai, Darsh Kaushik, Pai Peng, Nicklas Scharpff, David Vazquez, Joan Rodriguez,
- Abstract summary: We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics.<n>Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts.<n>We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios.
- Score: 15.299111837234678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics. Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts. A central challenge in this direction is the lack of suitable benchmarks. To address this need, we introduce the WildSVG Benchmark, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions. Together, these resources provide the first foundation for systematic benchmarking SVG extraction. We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios. Nonetheless, iterative refinement methods point to a promising path forward, and model capabilities are steadily improving
Related papers
- DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance [48.98604326855894]
We introduce DuetSVG, a unified multimodal model that jointly generates image tokens and corresponding SVG tokens in an end-to-end manner.<n>At inference, we apply a novel test-time scaling strategy that leverages the model's native visual predictions as guidance to improve SVG decoding quality.
arXiv Detail & Related papers (2025-12-11T18:23:03Z) - RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance [32.59099674596894]
RoboSVG is a unified framework for generating interactive SVGs guided by textual, visual, and numerical signals.<n>To support this framework, we construct RoboDraw, a large-scale dataset of one million examples.<n>RoboSVG achieves superior query compliance and visual fidelity across tasks, establishing a new state of the art in versatile SVG generation.
arXiv Detail & Related papers (2025-10-26T13:57:08Z) - InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models [65.49118879021016]
We present the InternSVG family, an integrated data-benchmark-model suite.<n>At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks.<n>We propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens.
arXiv Detail & Related papers (2025-10-13T12:38:04Z) - SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation [47.390332111383294]
We present SVGThinker, a reasoning-driven framework that aligns the production of SVG code with the visualization process.<n>Our pipeline first renders each primitive in sequence and uses a multimodal model to annotate the image and code.<n> Experiments against state-of-the-art baselines show that SVGThinker produces more stable, editable, and higher-quality SVGs.
arXiv Detail & Related papers (2025-09-29T05:25:00Z) - SVGen: Interpretable Vector Graphics Generation with Large Language Models [61.62816031675714]
We introduce SVG-1M, a large-scale dataset of high-quality SVGs paired with natural language descriptions.<n>We create well-aligned Text to SVG training pairs, including a subset with Chain of Thought annotations for enhanced semantic guidance.<n>Based on this dataset, we propose SVGen, an end-to-end model that generates SVG code from natural language inputs.
arXiv Detail & Related papers (2025-08-06T15:00:24Z) - OmniSVG: A Unified Scalable Vector Graphics Generation Model [69.59073636922287]
We propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models for end-to-end multimodal SVG generation.<n>By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the synthesis of complex SVG structure.<n>We introduce MMSVG-2M, a multimodal dataset with two million annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks.
arXiv Detail & Related papers (2025-04-08T17:59:49Z) - NeuralSVG: An Implicit Representation for Text-to-Vector Generation [54.4153300455889]
We propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts.<n>To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique.<n>We demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
arXiv Detail & Related papers (2025-01-07T18:50:06Z) - Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images
with Vision Language Models [19.145503353922038]
We introduce our method, Simple-SVG-Generation (Stextsuperscript2VGtextsuperscript2).
Our method focuses on producing SVGs that are both accurate and simple, aligning with human readability and understanding.
With simple images, we evaluate our method with reasoning tasks together with advanced language models, the results show a clear improvement over previous SVG generation methods.
arXiv Detail & Related papers (2023-11-27T05:20:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.