Fugu-MT 論文翻訳(概要): ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control

論文の概要: ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control

arxiv url: http://arxiv.org/abs/2603.14209v1
Date: Sun, 15 Mar 2026 03:55:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.670525
Title: ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control
Title（参考訳）: ChArtist: 空間と被写体を統一した図表を作成する
Authors: Shishi Xiao, Tongyu Zhou, David Laidlaw, Gromit Yeuk-Yin Chan,
Abstract要約: 図表は視覚的ストーリーテリングに有効な媒体であり、視覚的要素とデータチャートをシームレスに統合する。自然画像から高密度構造的手がかりを抽出する現在の方法は、図表生成のための条件付け信号として不適である。画像チャートを自動生成するドメイン固有拡散モデルであるChArtistを提案する。
参考スコア（独自算出の注目度）: 9.055386884800525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A pictorial chart is an effective medium for visual storytelling, seamlessly integrating visual elements with data charts. However, creating such images is challenging because the flexibility of visual elements often conflicts with the rigidity of chart structures. This process thus requires a creative deformation that maintains both data faithfulness and visual aesthetics. Current methods that extract dense structural cues from natural images (e.g., edge or depth maps) are ill-suited as conditioning signals for pictorial chart generation. We present ChArtist, a domain-specific diffusion model for generating pictorial charts automatically, offering two distinct types of control: 1) spatial control that aligns well with the chart structure, and 2) subject-driven control that respects the visual characteristics of a reference image. To achieve this, we introduce a skeleton-based spatial control representation. This representation encodes only the data-encoding information of the chart, allowing for the easy incorporation of reference visuals without a rigid outline constraint. We implement our method based on the Diffusion Transformer (DiT) and leverage an adaptive position encoding mechanism to manage these two controls. We further introduce Spatially Gated Attention to modulate the interaction between spatial control and subject control. To support the fine-tuning of pre-trained models for this task, we created a large-scale dataset of 30,000 triplets (skeleton, reference image, pictorial chart). We also propose a unified data accuracy metric to evaluate the data faithfulness of the generated charts. We believe this work demonstrates that current generative models can achieve data-driven visual storytelling by moving beyond general-purpose conditions to task-specific representations. Project page: https://chartist-ai.github.io/.
Abstract（参考訳）: 図表は視覚的ストーリーテリングに有効な媒体であり、視覚的要素とデータチャートをシームレスに統合する。しかし、視覚要素の柔軟性は、しばしばチャート構造の剛性と矛盾するため、そのような画像を作成することは困難である。このプロセスは、データの忠実さと視覚美の両方を維持する創造的な変形を必要とする。自然画像(例えば、エッジや深度マップ)から高密度構造的手がかりを抽出する現在の手法は、図表生成の条件付け信号として不適である。図表を自動生成するドメイン固有拡散モデルであるChArtistについて述べる。 1)チャート構造に整合した空間制御,及び 2)参照画像の視覚的特徴を尊重する主観的制御。これを実現するために,スケルトンに基づく空間制御表現を導入する。この表現はチャートのデータエンコード情報のみを符号化し、厳密なアウトライン制約なしに参照ビジュアルを容易に組み込むことができる。本手法はDiffusion Transformer (DiT) に基づいて実装し, 適応的な位置符号化機構を利用して2つの制御を管理する。さらに、空間制御と主観制御の相互作用を調節するために、空間拡張注意を導入する。このタスクのために、事前訓練されたモデルの微調整をサポートするために、30,000のトリプル(骨格、参照画像、画像チャート)からなる大規模なデータセットを作成しました。また、生成したグラフのデータ忠実度を評価するために、統一されたデータ精度指標を提案する。この研究は、汎用的な条件を超えてタスク固有の表現に移行することで、現在の生成モデルがデータ駆動型ビジュアルストーリーテリングを実現することができることを実証している。プロジェクトページ: https://chartist-ai.github.io/.com

論文の概要: ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control

関連論文リスト