Fugu-MT 論文翻訳(概要): PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation

論文の概要: PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation

arxiv url: http://arxiv.org/abs/2603.07244v1
Date: Sat, 07 Mar 2026 14:54:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.117548
Title: PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation
Title（参考訳）: PresentBench: スライド生成のための微粒なルブリックベースのベンチマーク
Authors: Xin-Sheng Chen, Jiayu Zhu, Pei-lin Li, Hanzheng Wang, Shuojin Yang, Meng-Hao Guo,
Abstract要約: PresentBenchは、現実世界の自動スライド生成を評価するための、きめ細かいルーリックベースのベンチマークである。これには238の評価インスタンスが含まれており、それぞれにスライド作成に必要な背景資料が補足されている。ベンチマークの結果,NotebookLMは他のスライド生成方法よりも大幅に優れていることがわかった。
参考スコア（独自算出の注目度）: 9.27005978533552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Slides serve as a critical medium for conveying information in presentation-oriented scenarios such as academia, education, and business. Despite their importance, creating high-quality slide decks remains time-consuming and cognitively demanding. Recent advances in generative models, such as Nano Banana Pro, have made automated slide generation increasingly feasible. However, existing evaluations of slide generation are often coarse-grained and rely on holistic judgments, making it difficult to accurately assess model capabilities or track meaningful advances in the field. In practice, the lack of fine-grained, verifiable evaluation criteria poses a critical bottleneck for both research and real-world deployment. In this paper, we propose PresentBench, a fine-grained, rubric-based benchmark for evaluating automated real-world slide generation. It contains 238 evaluation instances, each supplemented with background materials required for slide creation. Moreover, we manually design an average of 54.1 checklist items per instance, each formulated as a binary question, to enable fine-grained, instance-specific evaluation of the generated slide decks. Extensive experiments show that PresentBench provides more reliable evaluation results than existing methods, and exhibits significantly stronger alignment with human preferences. Furthermore, our benchmark reveals that NotebookLM significantly outperforms other slide generation methods, highlighting substantial recent progress in this domain.
Abstract（参考訳）: スライドは、学術、教育、ビジネスといったプレゼンテーション指向のシナリオで情報を伝える重要な媒体として機能する。その重要性にもかかわらず、高品質なスライドデッキを作ることは、時間をかけて認知的に要求される。最近のNano Banana Proのような生成モデルの進歩により、自動スライド生成が可能になった。しかし、既存のスライド生成の評価は粗い粒度であり、全体論的判断に依存しており、モデルの能力を正確に評価したり、分野で有意義な進歩を追跡することは困難である。実際には、きめ細かい、検証可能な評価基準の欠如は、研究と実世界の展開の両方に重大なボトルネックをもたらす。本稿では,実世界の自動スライド生成を評価するための,きめ細かなルーリックベースベンチマークであるPresentBenchを提案する。これには238の評価インスタンスが含まれており、それぞれにスライド作成に必要な背景資料が補足されている。さらに、各インスタンスの平均54.1個のチェックリストアイテムを2次質問として作成し、生成されたスライドデッキの詳細なインスタンス固有の評価を可能にする。大規模な実験により、PresentBenchは既存の方法よりも信頼性の高い評価結果を提供し、人間の嗜好と著しく一致していることが示された。さらに、ベンチマークの結果、NotebookLMは他のスライド生成手法よりも大幅に優れており、この領域における最近の進歩が顕著であることがわかった。

論文の概要: PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation

関連論文リスト