Fugu-MT 論文翻訳(概要): Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

論文の概要: Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

arxiv url: http://arxiv.org/abs/2605.28091v1
Date: Wed, 27 May 2026 07:46:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.855172
Title: Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation
Title（参考訳）: Qwen-Image-Bench: テキスト・画像評価における生成から生成へ
Authors: Niantong Li, Guangzheng Hu, Weixu Qiao, Ying Ba, Qichen Hong, Shijun Shen, Jinlin Wang, Fan Zhou, Jianye Kang, Xin Shang, Ziyi He, Wei Wang, Dalin Li, Jiahao Li, Jie Zhang, Kaiyuan Gao, Kun Yan, Lihan Jiang, Ningyuan Tang, Shengming Yin, Tianhe Wu, Xiao Xu, Xiaoyue Chen, Yuxiang Chen, Yan Shu, Yanran Zhang, Yilei Chen, Yixian Xu, Zekai Zhang, Zhendong Wang, Zihao Liu, Zikai Zhou, Hongzhu Shi, Yi Wang, Bing Zhao, Hu Wei, Lin Qu, Chenfei Wu,
Abstract要約: Qwen-Image-Benchは、クリエイター中心のベンチマークで、プロのアーティストと共同で設計され、現実世界の創造シナリオに基礎を置いています。我々はこれらの5つの柱をトップダウンの階層的な分類に分類し、23の第二級サブ能力と56の第三級ルーリックに分解する。 Qwen3.6-27Bに基づく統一審査モデルQ-Judgerを訓練し、ブラインドラベルとトリプルリビュープロトコルの下で、グローバルアートアカデミーの80名の専門家アノテータが監督する。
参考スコア（独自算出の注目度）: 52.930280794143876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-Image generation has evolved from basic image synthesis into a frequently used core capability in professional creative workflows, where simple text-image alignment can no longer satisfy users' pressing demands for faithful real-world reconstruction and genuine creative expression. Existing benchmarks, however, remain anchored in these foundational criteria and do not yet capture the nuanced capabilities that matter in authentic artistic practice, making it difficult to reliably distinguish state-of-the-art T2I models. To address the gap, we introduce Qwen-Image-Bench, a creator-centric benchmark co-designed with professional artists and grounded in real-world creation scenarios. Qwen-Image-Bench enriches conventional evaluation with two application-driven dimensions: Real-world Fidelity and Creative Generation. Drawing on the staged reasoning inherent in professional artistic workflows, we organize these five pillars into a top-down hierarchical taxonomy that further decomposes into 23 second-level sub-capabilities and 56 third-level verifiable rubrics. To ensure broad coverage, we curate 1000 stratified prompts with each prompt jointly exercising more than four fine-grained facets across multiple pillars. We train a unified judge model Q-Judger based on Qwen3.6-27B, supervised by 80 professional annotators from global art academies under blind labeling and triple-review protocols, that scores every image across all 56 verifiable facets, producing fine-grained, rubric-grounded, and fully attributable diagnostics rather than a single opaque score. Empirically, Qwen-Image-Bench reliably distinguishes leading T2I models, achieving the greatest separation on the two application-driven dimensions of Real-world Fidelity and Creative Generation where existing benchmarks provide little insight, while also providing a trustworthy optimization signal for production-level T2I development.
Abstract（参考訳）: テキスト・ツー・イメージ生成は、基本的な画像合成からプロのクリエイティブワークフローにおいて頻繁に使用されるコア機能へと進化してきた。しかし、既存のベンチマークは、これらの基礎的な基準に固執し、真の芸術的実践において重要なニュアンスな能力をまだ捉えていないため、最先端のT2Iモデルを確実に区別することは困難である。このギャップに対処するために、私たちはQwen-Image-Benchを紹介します。 Qwen-Image-Benchは、実世界の忠実さと創造的生成という2つのアプリケーション駆動の次元で従来の評価を強化します。プロの芸術的ワークフローに固有の段階的推論に基づいて、これらの5つの柱をトップダウンの階層的な分類に整理し、さらに23の第二レベルのサブ能力と56の第三レベルの検証可能なルーリックに分解する。広汎なカバレッジを確保するため、我々は、複数の柱にまたがる4つ以上のきめ細かい面を共同で運動させ、1000個の階層化されたプロンプトをキュレートした。 Qwen3.6-27Bに基づいて統一された審査モデルQ-Judgerをトレーニングし、視覚的なラベル付けと3重レビュープロトコルの下で、グローバルアートアカデミーの80人の専門家アノテータが監督し、56の検証可能なファセットすべてにすべての画像をスコア付けし、きめ細かい、ぼやけた、完全に帰属する診断を1つの不透明なスコアではなく生成する。実証的に、Qwen-Image-Benchは主要なT2Iモデルを確実に区別し、既存のベンチマークはほとんど洞察を得られず、生産レベルのT2I開発のための信頼できる最適化信号も提供し、実世界の忠実性と創造的生成という2つのアプリケーション駆動の次元において最大の分離を実現している。

論文の概要: Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

関連論文リスト