Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review
- URL: http://arxiv.org/abs/2503.04983v1
- Date: Thu, 06 Mar 2025 21:23:17 GMT
- Title: Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review
- Authors: Boris Malashenko, Ivan Jarsky, Valeria Efimova,
- Abstract summary: Traditional vectorization techniques suffer from long processing times and excessive output complexity.<n>The advent of large language models (LLMs) has opened new possibilities for the generation, editing, and analysis of vector graphics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, rapid advances in computer vision have significantly improved the processing and generation of raster images. However, vector graphics, which is essential in digital design, due to its scalability and ease of editing, have been relatively understudied. Traditional vectorization techniques, which are often used in vector generation, suffer from long processing times and excessive output complexity, limiting their usability in practical applications. The advent of large language models (LLMs) has opened new possibilities for the generation, editing, and analysis of vector graphics, particularly in the SVG format, which is inherently text-based and well-suited for integration with LLMs. This paper provides a systematic review of existing LLM-based approaches for SVG processing, categorizing them into three main tasks: generation, editing, and understanding. We observe notable models such as IconShop, StrokeNUWA, and StarVector, highlighting their strengths and limitations. Furthermore, we analyze benchmark datasets designed for assessing SVG-related tasks, including SVGEditBench, VGBench, and SGP-Bench, and conduct a series of experiments to evaluate various LLMs in these domains. Our results demonstrate that for vector graphics reasoning-enhanced models outperform standard LLMs, particularly in generation and understanding tasks. Furthermore, our findings underscore the need to develop more diverse and richly annotated datasets to further improve LLM capabilities in vector graphics tasks.
Related papers
- NeuralSVG: An Implicit Representation for Text-to-Vector Generation [54.4153300455889]
We propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts.<n>To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique.<n>We demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
arXiv Detail & Related papers (2025-01-07T18:50:06Z) - Empowering LLMs to Understand and Generate Complex Vector Graphics [30.21003939248769]
Large Language Models (LLMs) encode partial knowledge of vector graphics from web pages during training.<n>Recent findings suggest that semantically ambiguous and tokenized representations within LLMs may result in hallucinations in vector primitive predictions.<n>We present LLM4SVG, an initial yet substantial step toward bridging this gap by enabling LLMs to better understand and generate vector graphics.
arXiv Detail & Related papers (2024-12-15T07:49:31Z) - SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation [31.76771064173087]
We propose a novel text-guided vector graphics synthesis method to address limitations of existing methods.
We introduce a Hierarchical Image VEctorization (HIVE) framework that operates at the semantic object level.
We also present a Vectorized Particle-based Score Distillation (VPSD) approach to improve the diversity of output SVGs.
arXiv Detail & Related papers (2024-11-26T19:13:38Z) - Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models [14.917583676464266]
Chat2SVG is a hybrid framework that combines Large Language Models and image diffusion models for text-to-SVG generation.
Our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users.
arXiv Detail & Related papers (2024-11-25T17:31:57Z) - Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data.<n>We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders.<n>In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z) - All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks [51.19110891434727]
Large Language Models (LLMs) with pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data.
E-LLaGNN is a framework with an on-demand LLM service that enriches message passing procedure of graph learning by enhancing a limited fraction of nodes from the graph.
arXiv Detail & Related papers (2024-07-20T22:09:42Z) - VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation [28.1277394934428]
VGBench is a comprehensive benchmark for Large Language Models (LLMs) on handling vector graphics.
LLMs show strong capability on both aspects while exhibiting less desirable performance on low-level formats (SVG)
arXiv Detail & Related papers (2024-07-15T17:59:55Z) - Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge [76.45868419402265]
multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets.
However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs.
This paper proposes a new visual prompt approach to integrate fine-grained external knowledge, gleaned from specialized vision models, into MLLMs.
arXiv Detail & Related papers (2024-07-05T17:43:30Z) - SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis [66.44553285020066]
SuperSVG is a superpixel-based vectorization model that achieves fast and high-precision image vectorization.
We propose a two-stage self-training framework, where a coarse-stage model is employed to reconstruct the main structure and a refinement-stage model is used for enriching the details.
Experiments demonstrate the superior performance of our method in terms of reconstruction accuracy and inference time compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-14T07:43:23Z) - Visually Descriptive Language Model for Vector Graphics Reasoning [76.42082386029206]
We propose the Visually Descriptive Language Model (VDLM) to bridge the gap between low-level visual perception and high-level language reasoning.
We show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
arXiv Detail & Related papers (2024-04-09T17:30:18Z) - Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding [46.042197741423365]
Large language models (LLMs) have made significant advancements in natural language understanding.
This work investigates if it is possible for the LLM to understand images as well.
arXiv Detail & Related papers (2023-06-09T17:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.