Fugu-MT 論文翻訳(概要): VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

論文の概要: VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

arxiv url: http://arxiv.org/abs/2603.24575v1
Date: Wed, 25 Mar 2026 17:52:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.422982
Title: VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
Title（参考訳）: VFIG:視覚言語モデルを用いたSVGにおける複素フィギュアのベクトル化
Authors: Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna,
Abstract要約: 実際には、元のベクトルソースファイルは頻繁に失われるか、アクセス不能になる。複雑かつ高忠実な図形-SVG変換のために訓練された視覚言語モデルのファミリーであるVFIGを提案する。 VFIGはオープンソースのモデル間で最先端のパフォーマンスを達成し、GPT-5.2と同等に動作する。
参考スコア（独自算出の注目度）: 43.3181510471477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs, curated from a diverse mix of real-world paper figures and procedurally generated diagrams. Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we introduce a coarse-to-fine training curriculum that begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement to optimize global diagram fidelity, layout consistency, and topological edge cases. Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite with novel metrics designed to measure the structural integrity of complex figures. VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2, achieving a VLM-Judge score of 0.829 on VFIG-BENCH.
Abstract（参考訳）: スケーラブルベクトルグラフィックス(SVG)は、正確な解像度独立性と柔軟なセマンティック編集性を提供する、テクニカルイラストやデジタルデザインに不可欠なフォーマットである。しかし実際には、オリジナルのベクトルソースファイルは頻繁に失われたりアクセス不能になり、変更や拡張が難しいラスタ化バージョン(例えば、PNGやJPEG)だけが残る。これらの数字を手作業で再構築することは、明らかに労働集約的なプロセスであり、元の幾何学的意図を回復するために専門的な専門知識を必要とする。このギャップを埋めるために、複雑かつ高忠実なフィギュア・ツー・SVG変換のために訓練された視覚言語モデルのファミリーであるVFIGを提案する。このタスクは本質的にデータ駆動だが、既存のデータセットは通常小規模であり、専門的な図の複雑さを欠いている。 VFIG-DATAは66Kの高品質なフィギュア-SVGペアからなる大規模データセットで、現実世界のペーパーフィギュアと手続き的に生成されたダイアグラムの多種多様な組み合わせから算出される。 SVGは繰り返しプリミティブと階層的な局所構造で構成されていることを認識し、教師付き微調整(SFT)から始まる粗大な訓練カリキュラムを導入し、原子的プリミティブと強化学習(RL)の改良を学習し、グローバルなダイアグラムの忠実度、レイアウトの整合性、トポロジカルエッジケースを最適化する。最後に、複素数の構造的整合性を測定するために設計された新しい指標を備えた総合的な評価スイートであるVFIG-BENCHを紹介する。 VFIGはオープンソースのモデル間で最先端のパフォーマンスを達成し、GPT-5.2と同等に動作し、VFIG-BENCHのVLM-Judgeスコアは0.829である。

論文の概要: VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

関連論文リスト