Fugu-MT 論文翻訳(概要): From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration

論文の概要: From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration

arxiv url: http://arxiv.org/abs/2510.27452v1
Date: Fri, 31 Oct 2025 13:00:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.107161
Title: From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration
Title（参考訳）: PixelからPathへ: 編集可能な科学的イラストレーションのためのマルチエージェントフレームワーク
Authors: Jianwen Sun, Fanrui Zhang, Yukang Feng, Chuanhao Li, Zizhen Li, Jiaxin Ai, Yifan Chang, Yu Dai, Kaipeng Zhang,
Abstract要約: VisPainterは、モデルコンテキストプロトコル上に構築された科学イラストレーションのためのマルチエージェントフレームワークである。マネージャ、デザイナ、ツールボックスという3つの特殊なモジュールを編成し、標準ベクターグラフィックスソフトウェアと互換性のあるダイアグラムを共同で作成する。内容、レイアウト、視覚知覚、相互作用コストの4つの側面から、高情報密度の科学的イラストを評価する。
参考スコア（独自算出の注目度）: 38.72208780072352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific illustrations demand both high information density and post-editability. However, current generative models have two major limitations: Frist, image generation models output rasterized images lacking semantic structure, making it impossible to access, edit, or rearrange independent visual components in the images. Second, code-based generation methods (TikZ or SVG), although providing element-level control, force users into the cumbersome cycle of "writing-compiling-reviewing" and lack the intuitiveness of manipulation. Neither of these two approaches can well meet the needs for efficiency, intuitiveness, and iterative modification in scientific creation. To bridge this gap, we introduce VisPainter, a multi-agent framework for scientific illustration built upon the model context protocol. VisPainter orchestrates three specialized modules-a Manager, a Designer, and a Toolbox-to collaboratively produce diagrams compatible with standard vector graphics software. This modular, role-based design allows each element to be explicitly represented and manipulated, enabling true element-level control and any element can be added and modified later. To systematically evaluate the quality of scientific illustrations, we introduce VisBench, a benchmark with seven-dimensional evaluation metrics. It assesses high-information-density scientific illustrations from four aspects: content, layout, visual perception, and interaction cost. To this end, we conducted extensive ablation experiments to verify the rationality of our architecture and the reliability of our evaluation methods. Finally, we evaluated various vision-language models, presenting fair and credible model rankings along with detailed comparisons of their respective capabilities. Additionally, we isolated and quantified the impacts of role division, step control,and description on the quality of illustrations.
Abstract（参考訳）: 科学的イラストは高情報密度と後処理性の両方を要求する。しかしながら、現在の生成モデルは2つの大きな制限がある: フリスト、画像生成モデルは、意味構造を持たないラスタ化画像を出力し、画像内の独立した視覚コンポーネントへのアクセス、編集、再構成が不可能になる。第二に、コードベースの生成方法(TikZ または SVG)は要素レベルの制御を提供するが、ユーザを「書き込み-コンパイル-レビュー」という面倒なサイクルに陥らせ、操作の直感性を欠いている。これらの2つのアプローチはどちらも、科学的創造における効率性、直観性、反復的な修正の必要性を十分に満たしていない。このギャップを埋めるために、モデルコンテキストプロトコル上に構築された科学イラストのためのマルチエージェントフレームワークVisPainterを紹介します。 VisPainterはManageer、Designer、Toolboxという3つの特殊なモジュールを編成し、標準ベクターグラフィックスソフトウェアと互換性のあるダイアグラムを共同で作成する。このモジュラーなロールベースの設計により、各要素を明示的に表現し、操作することができ、真の要素レベルの制御を可能にし、任意の要素を後で追加および修正することができる。科学的イラストの質を体系的に評価するために,7次元評価指標を用いたベンチマークであるVisBenchを紹介する。内容、レイアウト、視覚知覚、相互作用コストの4つの側面から、高情報密度の科学的イラストを評価する。そこで我々は,アーキテクチャの合理性と評価手法の信頼性を検証するため,広範囲なアブレーション実験を行った。最後に,様々な視覚言語モデルの評価を行い,各能力の詳細な比較とともに,公平で信頼性の高いモデルランキングを提示した。さらに、我々は、役割分割、ステップ制御、イラストの質に関する記述の影響を分離し、定量化した。

論文の概要: From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration

関連論文リスト