Fugu-MT 論文翻訳(概要): Infinity and Beyond: Compositional Alignment in VAR and Diffusion T2I Models

論文の概要: Infinity and Beyond: Compositional Alignment in VAR and Diffusion T2I Models

arxiv url: http://arxiv.org/abs/2512.11542v1
Date: Fri, 12 Dec 2025 13:22:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-15 15:48:11.778593
Title: Infinity and Beyond: Compositional Alignment in VAR and Diffusion T2I Models
Title（参考訳）: 無限大と超越:VARと拡散T2Iモデルにおける組成アライメント
Authors: Hossein Shahabadi, Niki Sepasian, Arash Marioriyad, Ali Sharifi-Zarchi, Mahdieh Soleymani Baghshah,
Abstract要約: 6種類のテキスト・ツー・イメージ・システム(T2I)をベンチマークする。我々は,色と属性の結合,空間関係,数理性,複雑な多目的プロンプトのアライメントを評価する。 SDXLとPixArt-$$は、属性に敏感な空間的タスクにおいて永続的な弱点を示す。
参考スコア（独自算出の注目度）: 8.72752668537241
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Achieving compositional alignment between textual descriptions and generated images - covering objects, attributes, and spatial relationships - remains a core challenge for modern text-to-image (T2I) models. Although diffusion-based architectures have been widely studied, the compositional behavior of emerging Visual Autoregressive (VAR) models is still largely unexamined. We benchmark six diverse T2I systems - SDXL, PixArt-$α$, Flux-Dev, Flux-Schnell, Infinity-2B, and Infinity-8B - across the full T2I-CompBench++ and GenEval suites, evaluating alignment in color and attribute binding, spatial relations, numeracy, and complex multi-object prompts. Across both benchmarks, Infinity-8B achieves the strongest overall compositional alignment, while Infinity-2B also matches or exceeds larger diffusion models in several categories, highlighting favorable efficiency-performance trade-offs. In contrast, SDXL and PixArt-$α$ show persistent weaknesses in attribute-sensitive and spatial tasks. These results provide the first systematic comparison of VAR and diffusion approaches to compositional alignment and establish unified baselines for the future development of the T2I model.
Abstract（参考訳）: テキスト記述と生成された画像(オブジェクト、属性、空間的関係を含む)のコンポジションアライメントを達成することは、現代のテキスト・ツー・イメージ(T2I)モデルにおいて、依然として重要な課題である。拡散型アーキテクチャは広く研究されているが、新しいVisual Autoregressive(VAR)モデルの構成的挙動はいまだに未検討である。 SDXL, PixArt-$α$, Flux-Dev, Flux-Schnell, Infinity-2B, Infinity-8B – T2I-CompBench++とGenEvalスイートの6つの多様なT2Iシステムをベンチマークし,色と属性の結合, 空間関係, 数値性, 複雑多目的プロンプトの整合性を評価する。両方のベンチマークで、Infinity-8Bは最も高い総合的なコンポジションアライメントを達成する一方、Infinity-2Bはいくつかのカテゴリでより大きな拡散モデルに適合または超える。対照的に、SDXL と PixArt-$α$ は属性感受性および空間的タスクにおいて永続的な弱点を示す。これらの結果は,合成アライメントに対するVARと拡散アプローチの体系的比較を行い,今後のT2Iモデル開発のための統一ベースラインを確立する。

論文の概要: Infinity and Beyond: Compositional Alignment in VAR and Diffusion T2I Models

関連論文リスト