Fugu-MT 論文翻訳(概要): Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

論文の概要: Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.16189v1
Date: Tue, 17 Mar 2026 07:16:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.142999
Title: Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning
Title（参考訳）: マルチタスクマルチリワード強化学習によるSVG-LLMの信頼性推論
Authors: Haomin Wang, Qi Wei, Qianli Ma, Shengyuan Ding, Jinhui Yin, Kai Chen, Hongjie Zhang,
Abstract要約: 本稿では,SVG生成時のモデル推論プロセスを公開する統一的なフレームワークであるSVGのためのChain-of-Thought-Reinforcement Learningを提案する。 SVGモデルをトレーニングしてグループレベルのコードを生成することにより、構造的コヒーレンスと視覚的忠実度を大幅に改善する。提案手法は,全体の生成能力を体系的に向上し,タスク成功率の向上,SVGのコード品質の向上,視覚的忠実度の向上を実現している。
参考スコア（独自算出の注目度）: 15.21004381065588
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the rapid advancement of vision-language models, an increasing number of studies have explored their potential for SVG generation tasks. Although existing approaches improve performance by constructing large-scale SVG datasets and introducing SVG-specific tokens, they still suffer from limited generalization, redundant paths in code outputs, and a lack of explicit reasoning. In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model's reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset containing 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. By training the model to generate group-level structured SVG code, CTRL-S significantly improves structural coherence and visual fidelity. Furthermore, we adopt the GRPO algorithm and design a multi-reward optimization framework, incorporating DINO, image-text similarity, format, and code efficiency rewards. Through joint multi-reward optimization and multi-task training, our approach systematically enhances overall generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior SVG code quality, and exceptional visual fidelity.
Abstract（参考訳）: 視覚言語モデルの急速な進歩に伴い、SVG生成タスクの可能性を探る研究が増えている。既存のアプローチでは、大規模なSVGデータセットの構築やSVG固有のトークンの導入によってパフォーマンスが向上するが、それでも限定的な一般化、コード出力の冗長パス、明示的な推論の欠如に悩まされている。本稿では,SVG生成時のモデルの推論過程を明示するチェーン・オブ・シント機構を導入し,CTRL-S(Chain-of-Thought Reinforcement Learning for SVG)を提案する。この構造的推論を支援するために,SVGコード修正,テキスト間SVG,画像間SVGタスクを含む高品質なデータセットであるSVG-Sophiaを構築した。グループレベルのSVGコードを生成するためにモデルをトレーニングすることにより、CTRL-Sは構造的コヒーレンスと視覚的忠実度を大幅に改善する。さらに、GRPOアルゴリズムを採用し、DINO、画像とテキストの類似性、フォーマット、コード効率の報酬を組み込んだマルチリワード最適化フレームワークを設計する。共同マルチリワード最適化とマルチタスクトレーニングにより,本手法は全体の生成能力を体系的に向上する。大規模な実験により、CTRL-Sは既存の手法よりも優れ、タスク成功率の向上、SVGのコード品質の向上、および例外的な視覚的忠実度を実現している。

論文の概要: Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

関連論文リスト