Fugu-MT 論文翻訳(概要): AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

論文の概要: AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

arxiv url: http://arxiv.org/abs/2603.28068v1
Date: Mon, 30 Mar 2026 06:14:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.253178
Title: AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation
Title（参考訳）: AIBench: アカデミックイラストレーション生成における視覚論理的一貫性の評価
Authors: Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu, Quanhao Li, Hong-Tao Yu, Pandeng Li, Yuzheng Wang, Zhen Xing, Shiwei Zhang, Chen-Wei Xie, Yun Zheng, Xihui Liu,
Abstract要約: 本稿では,学術イラストの論理的正確性を評価するためにVQAを用いた最初のベンチマークであるAIBenchと美学評価のためのVLMを提案する。我々のVQAベースのアプローチは、判断器VLMの能力に頼らず、視覚的論理的整合性をより正確かつ詳細に評価する。
参考スコア（独自算出の注目度）: 50.68300726392683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although image generation has boosted various applications via its rapid evolution, whether the state-of-the-art models are able to produce ready-to-use academic illustrations for papers is still largely unexplored.Directly comparing or evaluating the illustration with VLM is native but requires oracle multi-modal understanding ability, which is unreliable for long and complex texts and illustrations. To address this, we propose AIBench, the first benchmark using VQA for evaluating logic correctness of the academic illustrations and VLMs for assessing aesthetics. In detail, we designed four levels of questions proposed from a logic diagram summarized from the method part of the paper, which query whether the generated illustration aligns with the paper on different scales. Our VQA-based approach raises more accurate and detailed evaluations on visual-logical consistency while relying less on the ability of the judger VLM. With our high-quality AIBench, we conduct extensive experiments and conclude that the performance gap between models on this task is significantly larger than general ones, reflecting their various complex reasoning and high-density generation ability. Further, the logic and aesthetics are hard to optimize simultaneously as in handcrafted illustrations. Additional experiments further state that test-time scaling on both abilities significantly boosts the performance on this task.
Abstract（参考訳）: 画像生成は、その急速な進化を通じて様々な応用を加速してきたが、最先端のモデルが論文のための準備済みの学術イラストを作成できるかどうかはまだ明らかになっていない。VLMと直接比較または評価することはネイティブであるが、長く複雑なテキストやイラストには信頼性の低いオラクル多モード理解能力を必要とする。そこで本研究では,学術イラストの論理的正確性を評価するためにVQAを使用した最初のベンチマークであるAIBenchと美学評価のためのVLMを提案する。筆者らは,論文の手法部分から要約した論理図から提案した4段階の質問を設計した。我々のVQAベースのアプローチは、判断器VLMの能力に頼らず、視覚的論理的整合性をより正確かつ詳細に評価する。高品質なAIBenchでは、広範囲な実験を行い、このタスクのモデル間の性能ギャップは一般的なモデルよりもかなり大きいと結論付け、それらの複雑な推論と高密度生成能力を反映している。さらに、手作りのイラストのように、論理と美学を同時に最適化することは困難である。追加の実験では、両方の能力におけるテストタイムのスケーリングが、このタスクのパフォーマンスを著しく向上させると述べている。

論文の概要: AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

関連論文リスト