Fugu-MT 論文翻訳(概要): Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

論文の概要: Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

arxiv url: http://arxiv.org/abs/2509.03516v1
Date: Wed, 03 Sep 2025 17:58:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 21:40:46.621381
Title: Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Title（参考訳）: テキスト・トゥ・イメージ・モデルで舞台に立つことはできるのか?
Authors: Ouxiang Li, Yuan Wang, Xinting Hu, Huijuan Huang, Rui Chen, Jiarong Ou, Xin Tao, Pengfei Wan, Fuli Feng,
Abstract要約: T2I-CoReBenchは、T2Iモデルの合成能力と推論能力の両方を評価する包括的で複雑なベンチマークである。実世界のシナリオに固有の複雑さによって引き起こされる複雑さを増大させるために、コンポジション密度の高い各プロンプトをキュレートする。統計学では、我々のベンチマークは1080の挑戦的なプロンプトと約1,500のチェックリスト質問で構成されている。
参考スコア（独自算出の注目度）: 47.34634464600429
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) generation aims to synthesize images from textual prompts, which jointly specify what must be shown and imply what can be inferred, thereby corresponding to two core capabilities: composition and reasoning. However, with the emerging advances of T2I models in reasoning beyond composition, existing benchmarks reveal clear limitations in providing comprehensive evaluations across and within these capabilities. Meanwhile, these advances also enable models to handle more complex prompts, whereas current benchmarks remain limited to low scene density and simplified one-to-one reasoning. To address these limitations, we propose T2I-CoReBench, a comprehensive and complex benchmark that evaluates both composition and reasoning capabilities of T2I models. To ensure comprehensiveness, we structure composition around scene graph elements (instance, attribute, and relation) and reasoning around the philosophical framework of inference (deductive, inductive, and abductive), formulating a 12-dimensional evaluation taxonomy. To increase complexity, driven by the inherent complexities of real-world scenarios, we curate each prompt with high compositional density for composition and multi-step inference for reasoning. We also pair each prompt with a checklist that specifies individual yes/no questions to assess each intended element independently to facilitate fine-grained and reliable evaluation. In statistics, our benchmark comprises 1,080 challenging prompts and around 13,500 checklist questions. Experiments across 27 current T2I models reveal that their composition capability still remains limited in complex high-density scenarios, while the reasoning capability lags even further behind as a critical bottleneck, with all models struggling to infer implicit elements from prompts. Our project page: https://t2i-corebench.github.io/.
Abstract（参考訳）: Text-to-image (T2I) 生成はテキストプロンプトから画像を合成することを目的としている。しかしながら、T2Iモデルの合成以上の推論の進歩により、既存のベンチマークは、これらの機能間の包括的な評価を提供する際の明確な制限を明らかにしている。一方、これらの進歩によりモデルがより複雑なプロンプトを処理できるようになる一方、現在のベンチマークはシーン密度の低いものに限られており、1対1の推論が単純化されている。これらの制約に対処するために、T2Iモデルの構成と推論能力の両方を評価する包括的かつ複雑なベンチマークであるT2I-CoReBenchを提案する。包括性を確保するために、シーングラフ要素(インスタンス、属性、関係)を中心に構成し、12次元評価分類を定式化した推論(帰納的、帰納的、帰納的)の哲学的枠組みについて推論する。実世界のシナリオに固有の複雑さによって引き起こされる複雑さを増大させるため、各プロンプトを高い組成密度でキュレートし、推論のために多段階の推論を行う。また、各プロンプトを個別のイエス/ノー質問を指定するチェックリストと組み合わせて、個々の意図した要素を個別に評価し、きめ細かい、信頼性の高い評価を容易にする。統計学では、我々のベンチマークは1080の挑戦的なプロンプトと約1,500のチェックリスト質問で構成されている。 27の現行のT2Iモデルに対する実験では、複雑な高密度シナリオにおいて、構成能力は依然として制限されているが、推論能力は重要なボトルネックとしてさらに遅れており、すべてのモデルがプロンプトから暗黙的な要素を推論するのに苦労している。プロジェクトページはhttps://t2i-corebench.github.io/です。

関連論文リスト

Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models [15.983959465314749]
我々は、暗黙の世界の知識とT2Iモデルの物理的因果推論の把握を評価する最初の総合的なベンチマークであるPicWorldを紹介する。このベンチマークは、3つのコアカテゴリにわたる1,100のプロンプトで構成されている。我々は、PicWorldで17の主流モデルT2Iを徹底的に分析し、暗黙の世界知識と物理的因果推論の能力に普遍的に限界があることを示した。
論文参考訳（メタデータ） (2025-11-23T03:44:54Z)
DeCoT: Decomposing Complex Instructions for Enhanced Text-to-Image Generation with Large Language Models [9.800887055353096]
本稿では,T2Iモデルの複雑な命令の理解と実行を強化するフレームワークであるDeCoT(Decomposition-CoT)を提案する。 LongBench-T2Iデータセットの大規模な実験は、DeCoTが一貫し、主要なT2Iモデルの性能を大幅に向上することを示した。
論文参考訳（メタデータ） (2025-08-17T15:15:39Z)
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? [7.13169573900556]
本稿では, TIIF-Bench (Text-to-Image Instruction following Benchmark) を提案する。 TIIF-Benchは、複数の次元に沿って組織された5000のプロンプトから構成されており、難易度と複雑さの3つのレベルに分類される。 T2Iモデルのテキスト合成精度と美的コヒーレンスを評価するために,テキストレンダリングとスタイル制御という2つの重要な属性が導入された。
論文参考訳（メタデータ） (2025-06-02T18:44:07Z)
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation [26.816674696050413]
推論は、現実世界のテキスト・ツー・イメージ(T2I)生成に必要な基本的な機能である。最近のT2Iモデルでは、フォトリアリスティックな画像の生成が著しく進歩しているが、その推論能力はまだ未開発である。推論駆動型T2I生成を厳格に評価するベンチマークであるR2I-Benchを紹介する。
論文参考訳（メタデータ） (2025-05-29T14:43:46Z)
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? [46.639370210630936]
DetailMasterはテキスト・ツー・イメージ(T2I)モデルを評価するために設計された最初の総合ベンチマークである。このベンチマークは、専門家アノテータによって高品質な検証がなされた、平均284.89トークンの長大かつ詳細なプロンプトで構成されている。 7つの汎用T2Iモデルと5つの長周期最適化T2Iモデルの評価は、重要な性能限界を示す。
論文参考訳（メタデータ） (2025-05-22T17:11:27Z)
Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image [53.09546752700792]
我々は,この代替プロセスを明示論理ナラティブ・プロンプト (ELNP) と呼ぶ方法を提案する。合成画像において,プロンプトに要求される概念を平均的にカバーできる数を計算するための計量を設計する。大規模な実験と定性比較により、我々の戦略が反実的T2Iの概念の整合性を高めることが示される。
論文参考訳（メタデータ） (2025-05-20T13:27:52Z)
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) [62.44395685571094]
T2IScoreScoreはプロンプトを含むセマンティックエラーグラフのキュレートされたセットであり,誤画像の集合である。これにより、与えられた迅速な忠実度測定値が、客観的な誤差数に対して正しく画像を順序付けできるかどうかを厳格に判断することができる。最先端のVLMベースのメトリクスは、CLIPScoreのような単純な(そしておそらく悪い)機能ベースのメトリクスを著しく上回りません。
論文参考訳（メタデータ） (2024-04-05T17:57:16Z)
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics [58.83242220266935]
我々は,T2Iモデルの構成性を評価するためのベンチマークであるWinoground-T2Iを紹介する。このベンチマークには、20のカテゴリにまたがる11Kの複雑で高品質なコントラスト文ペアが含まれている。我々は、Winoground-T2Iモデルの性能評価と、その評価に使用される指標の2つの目的を兼ね備えたWinoground-T2Iを用いている。
論文参考訳（メタデータ） (2023-12-04T20:47:48Z)
T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation [55.16845189272573]
T2I-CompBench++は、合成テキスト・画像生成のための拡張ベンチマークである。 8000のコンポジションテキストプロンプトは、属性バインディング、オブジェクト関係、生成数、複雑なコンポジションの4つのグループに分類される。
論文参考訳（メタデータ） (2023-07-12T17:59:42Z)
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis [78.28620571530706]
大規模拡散モデルはテキスト・ツー・イメージ合成(T2I)タスクの最先端の結果を得た。我々は,T2Iモデルの合成スキル,特により正確な属性結合と画像合成を改善する。
論文参考訳（メタデータ） (2022-12-09T18:30:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。