Fugu-MT 論文翻訳(概要): Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models

論文の概要: Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models

arxiv url: http://arxiv.org/abs/2606.05949v1
Date: Thu, 04 Jun 2026 09:49:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.703392
Title: Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models
Title（参考訳）: 忠実で豊かで高精度:T2Iモデルによる自然科学図面生成のベンチマーク
Authors: Yifan Chang, Jiaxin Ai, Jianwen Sun, Yuandong Pu, Siqi Luo, Liangliang Zhao, Yuchen Ren, Minghao Liu, Yunfei Yu, Yu Qiao, Kaipeng Zhang, Yihao Liu,
Abstract要約: FEPBenchは、慎重に選択された高品質な科学イラストから構築されたベンチマークである。我々は,T2Iモデルについて,命令忠実度,推論エンリッチメント,意味的精度の3つの次元に沿って評価する。その結果、GPT Image 2やNano Banana Proのような最先端のクローズドソースモデルでさえ、テキストレンダリングのボトルネック、推論のリッチ化の制限、生成のリッチネスと精度のバランスの難しさに悩まされていることがわかった。
参考スコア（独自算出の注目度）: 49.93401412148884
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Scientific illustrations are essential tools for communicating research findings, especially in natural science, where they visualize complex concepts and processes. As Text-to-Image (T2I) models become increasingly capable, researchers have started to use them for scientific illustration generation. However, existing benchmarks often assess outputs at a holistic level, overlooking fine-grained elements, while scientific reasoning ability and output conciseness remain under-quantified. We introduce FEPBench, a benchmark built from carefully selected high-quality scientific illustrations across multiple disciplines and layout types. With the assistance of multimodal large language models (MLLMs) and human experts, we provide fine-grained atom set annotations and systematically evaluate T2I models along three dimensions: instruction faithfulness, reasoning enrichment, and semantic precision. Our evaluation further decomposes model performance across visual, textual, relation, and layout elements. Results show that even state-of-the-art (SOTA) closed-source models, such as GPT Image 2 and Nano Banana Pro, still suffer from text-rendering bottlenecks, limited reasoning enrichment, and difficulty balancing generation richness with precision. These findings provide practical guidance for improving and deploying T2I models in scientific illustration generation. Benchmark data, atom set annotations, and evaluation code will be released by us.
Abstract（参考訳）: 科学的イラストは、複雑な概念や過程を視覚化する、特に自然科学における研究成果を伝えるための重要なツールである。テキスト・トゥ・イメージ(T2I)モデルがますます有能になるにつれて、研究者たちはそれを科学的なイラスト生成に利用し始めた。しかしながら、既存のベンチマークでは、科学的推論能力と出力の簡潔さが未定のままであるのに対して、細かな要素を見渡すことによって、アウトプットを全体論的なレベルで評価することが多い。 FEPBenchは、複数の分野やレイアウトタイプにまたがる、慎重に選択された高品質な科学的イラストから構築されたベンチマークである。マルチモーダルな大言語モデル(MLLM)と人間の専門家の助けを借りて、微粒な原子集合アノテーションを提供し、T2Iモデルを3次元に沿って体系的に評価する。我々の評価は、視覚、テキスト、関係、レイアウト要素間でのモデル性能をさらに分解する。その結果、GPT Image 2やNano Banana Proのような最先端(SOTA)クローズソースモデルでさえ、テキストレンダリングのボトルネック、推論の充実の制限、生成リッチネスと精度のバランスをとるのが困難であることがわかった。これらの知見は, 科学的図形生成におけるT2Iモデルの改良と展開の実践的ガイダンスを提供する。ベンチマークデータ、アトムセットアノテーション、評価コードは、私たちによってリリースされます。

論文の概要: Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models

関連論文リスト