StarVector: Generating Scalable Vector Graphics Code from Images
- URL: http://arxiv.org/abs/2312.11556v1
- Date: Sun, 17 Dec 2023 08:07:32 GMT
- Title: StarVector: Generating Scalable Vector Graphics Code from Images
- Authors: Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez,
David Vazquez, Christopher Pal, and Marco Pedersoli
- Abstract summary: This paper introduces Star, a multimodal SVG generation model that integrates Code Generation Large Language Models (CodeLLMs) and vision models.
Our approach utilizes a CLIP image to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module.
Our results demonstrate significant enhancements in visual quality and complexity over current methods, marking a notable advancement in SVG generation technology.
- Score: 13.995963187283321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scalable Vector Graphics (SVGs) have become integral in modern image
rendering applications due to their infinite scalability in resolution,
versatile usability, and editing capabilities. SVGs are particularly popular in
the fields of web development and graphic design. Existing approaches for SVG
modeling using deep learning often struggle with generating complex SVGs and
are restricted to simpler ones that require extensive processing and
simplification. This paper introduces StarVector, a multimodal SVG generation
model that effectively integrates Code Generation Large Language Models
(CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to
extract visual representations from pixel-based images, which are then
transformed into visual tokens via an adapter module. These visual tokens are
pre-pended to the SVG token embeddings, and the sequence is modeled by the
StarCoder model using next-token prediction, effectively learning to align the
visual and code tokens. This enables StarVector to generate unrestricted SVGs
that accurately represent pixel images. To evaluate StarVector's performance,
we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods
across multiple datasets and relevant metrics. Within this benchmark, we
introduce novel datasets including SVG-Stack, a large-scale dataset of
real-world SVG examples, and use it to pre-train StarVector as a large
foundation model for SVGs. Our results demonstrate significant enhancements in
visual quality and complexity handling over current methods, marking a notable
advancement in SVG generation technology. Code and models:
https://github.com/joanrod/star-vector
Related papers
- Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision [20.325246638505714]
We introduce GRIMOIRE, a text-guided generative model that learns to map images onto a discrete codebook by reconstructing them as vector shapes.
Unlike existing models that require direct supervision from data, GRIMOIRE learns using only image supervision which opens up vector generative modeling to significantly more data.
arXiv Detail & Related papers (2024-10-08T12:41:31Z) - SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis [66.44553285020066]
SuperSVG is a superpixel-based vectorization model that achieves fast and high-precision image vectorization.
We propose a two-stage self-training framework, where a coarse-stage model is employed to reconstruct the main structure and a refinement-stage model is used for enriching the details.
Experiments demonstrate the superior performance of our method in terms of reconstruction accuracy and inference time compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-14T07:43:23Z) - SVGDreamer: Text Guided SVG Generation with Diffusion Model [31.76771064173087]
We propose a novel text-guided vector graphics synthesis method called SVGDreamer.
SIVE process enables decomposition of synthesis into foreground objects and background.
VPSD approach addresses issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence.
arXiv Detail & Related papers (2023-12-27T08:50:01Z) - Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images
with Vision Language Models [19.145503353922038]
We introduce our method, Simple-SVG-Generation (Stextsuperscript2VGtextsuperscript2).
Our method focuses on producing SVGs that are both accurate and simple, aligning with human readability and understanding.
With simple images, we evaluate our method with reasoning tasks together with advanced language models, the results show a clear improvement over previous SVG generation methods.
arXiv Detail & Related papers (2023-11-27T05:20:11Z) - SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything
Model [59.40189857428461]
We propose a multi-stage model to vectorize images into SVG (Scalable Vector Graphics)
Firstly, SAMVG uses general image segmentation provided by the Segment-Anything Model and uses a novel filtering method to identify the best dense segmentation map for the entire image.
Secondly, SAMVG then identifies missing components and adds more detailed components to the SVG.
arXiv Detail & Related papers (2023-11-09T11:11:56Z) - VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models [82.93345261434943]
We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics.
Inspired by recent text-to-3D work, we learn an SVG consistent with a caption using Score Distillation Sampling.
Experiments show greater quality than prior work, and demonstrate a range of styles including pixel art and sketches.
arXiv Detail & Related papers (2022-11-21T10:04:27Z) - Towards Layer-wise Image Vectorization [57.26058135389497]
We propose Layerwise Image Vectorization, namely LIVE, to convert images to SVGs and simultaneously maintain its image topology.
Live generates compact forms with layer-wise structures that are semantically consistent with human perspective.
Live initiates human editable SVGs for both designers and can be used in other applications.
arXiv Detail & Related papers (2022-06-09T17:55:02Z) - SVG-Net: An SVG-based Trajectory Prediction Model [67.68864911674308]
Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems.
To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories.
Most of the proposed approaches represent the scene with averse averseized format and some of the more recent approaches leverage custom vectorized formats.
arXiv Detail & Related papers (2021-10-07T18:00:08Z) - DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation [217.86315551526235]
We propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and manipulation.
Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself.
We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool.
arXiv Detail & Related papers (2020-07-22T09:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.