Fugu-MT 論文翻訳(概要): MEF: A Systematic Evaluation Framework for Text-to-Image Models

論文の概要: MEF: A Systematic Evaluation Framework for Text-to-Image Models

arxiv url: http://arxiv.org/abs/2509.17907v1
Date: Mon, 22 Sep 2025 15:32:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:16.468094
Title: MEF: A Systematic Evaluation Framework for Text-to-Image Models
Title（参考訳）: MEF:テキスト・画像モデルのためのシステム評価フレームワーク
Authors: Xiaojing Dong, Weilin Huang, Liang Li, Yiying Li, Shu Liu, Tongtong Ou, Shuang Ouyang, Yu Tian, Fengxuan Zhao,
Abstract要約: 現在の評価は、総合ランキングにELO、次元別スコアにMOSのいずれかに依存している。我々は、T2Iモデルを評価するための体系的で実践的なアプローチであるマジック・アセスメント・フレームワーク(MEF)を紹介する。評価フレームワークをリリースし,Magic-Bench-377を完全オープンソースにし,視覚生成モデルの評価研究を進めた。
参考スコア（独自算出の注目度）: 21.006921005280493
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Rapid advances in text-to-image (T2I) generation have raised higher requirements for evaluation methodologies. Existing benchmarks center on objective capabilities and dimensions, but lack an application-scenario perspective, limiting external validity. Moreover, current evaluations typically rely on either ELO for overall ranking or MOS for dimension-specific scoring, yet both methods have inherent shortcomings and limited interpretability. Therefore, we introduce the Magic Evaluation Framework (MEF), a systematic and practical approach for evaluating T2I models. First, we propose a structured taxonomy encompassing user scenarios, elements, element compositions, and text expression forms to construct the Magic-Bench-377, which supports label-level assessment and ensures a balanced coverage of both user scenarios and capabilities. On this basis, we combine ELO and dimension-specific MOS to generate model rankings and fine-grained assessments respectively. This joint evaluation method further enables us to quantitatively analyze the contribution of each dimension to user satisfaction using multivariate logistic regression. By applying MEF to current T2I models, we obtain a leaderboard and key characteristics of the leading models. We release our evaluation framework and make Magic-Bench-377 fully open-source to advance research in the evaluation of visual generative models.
Abstract（参考訳）: テキスト・トゥ・イメージ(T2I)生成の急速な進歩により,評価手法の要求が高まっている。既存のベンチマークは、客観的な能力と寸法に重点を置いているが、アプリケーション・シナリオの観点が欠如しており、外部の妥当性が制限されている。さらに、現在の評価では、全体的なランキングはELO、次元別スコアはMOSのいずれかに依存しているが、どちらの手法も固有の欠点と限定的な解釈可能性を持っている。そこで我々は,T2Iモデルを評価するための体系的かつ実践的なアプローチであるマジック・アセスメント・フレームワーク(MEF)を紹介した。まず, ユーザシナリオ, 要素, 要素構成, テキスト表現形式を含む構造的分類法を提案し, ラベルレベルの評価をサポートし, ユーザシナリオと能力のバランスの取れたカバレッジを確保するMagic-Bench-377を構築した。そこで我々は,ELOと次元特異的MOSを組み合わせて,それぞれモデルランキングときめ細かい評価を生成する。この共同評価手法により,多変量ロジスティック回帰を用いて,各次元のユーザの満足度に対する貢献度を定量的に分析することができる。現在のT2IモデルにMEFを適用することで、リーダーボードと主要なモデルの特徴が得られる。評価フレームワークをリリースし,Magic-Bench-377を完全オープンソースにし,視覚生成モデルの評価研究を進めた。

論文の概要: MEF: A Systematic Evaluation Framework for Text-to-Image Models

関連論文リスト