Fugu-MT 論文翻訳(概要): MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generation

論文の概要: MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generation

arxiv url: http://arxiv.org/abs/2604.01864v1
Date: Thu, 02 Apr 2026 10:19:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.678287
Title: MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generation
Title（参考訳）: MAR-MAER:メトリクス認識とあいまいさ適応型自己回帰画像生成
Authors: Kai Dong, Tingting Bai,
Abstract要約: 本稿では,革新的階層的自己回帰フレームワークであるMAR-MAERを紹介する。これは、メートル法を意識した埋め込み正規化法であり、曖昧な意味論を扱うために使われる潜在モデルである。提案手法は,CLIPScore や HPSv2 などの品質指標とモデルの内部表現を一致させる。メトリクスの一貫性とセマンティックな柔軟性の両方において優れたパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 1.4552327135549117
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive (AR) models have demonstrated significant success in the realm of text-to-image generation. However, they usually face two major challenges. Firstly, the generated images may not always meet the quality standards expected by humans. Furthermore, these models face difficulty when dealing with ambiguous prompts that could be interpreted in several valid ways. To address these issues, we introduce MAR-MAER, an innovative hierarchical autoregressive framework. It combines two main components. It is a metric-aware embedding regularization method. The other one is a probabilistic latent model used for handling ambiguous semantics. Our method utilizes a lightweight projection head, which is trained with an adaptive kernel regression loss function. This aligns the model's internal representations with human-preferred quality metrics, such as CLIPScore and HPSv2. As a result, the embedding space that is learned more accurately reflects human judgment. We are also introducing a conditional variational module. This approach incorporates an aspect of controlled randomness within the hierarchical token generation process. This capability allows the model to produce a diverse array of coherent images based on ambiguous or open-ended prompts. We conducted extensive experiments using COCO and a newly developed Ambiguous-Prompt Benchmark. The results show that MAR-MAER achieves excellent performance in both metric consistency and semantic flexibility. It exceeds the baseline Hi-MAR model's performance, showing an improvement of +1.6 in CLIPScore and +5.3 in HPSv2. For unclear inputs, it produces a notably wider range of outputs. These findings have been confirmed through both human evaluation and automated metrics.
Abstract（参考訳）: 自動回帰(AR)モデルは、テキスト・画像生成の領域で大きな成功を収めている。しかし、それらは通常2つの大きな課題に直面します。第一に、生成された画像は必ずしも人間が期待する品質基準を満たすとは限らない。さらに、これらのモデルは、いくつかの有効な方法で解釈できる曖昧なプロンプトを扱う際に困難に直面している。これらの問題に対処するために,革新的階層的自己回帰フレームワークであるMAR-MAERを紹介する。 2つの主要コンポーネントを結合する。これはメートル法を意識した埋め込み正規化法である。もう1つは、あいまいな意味論を扱うために使われる確率的潜在モデルである。本手法は,適応型カーネル回帰損失関数を用いて訓練した軽量プロジェクションヘッドを用いる。これは、モデルの内部表現と、CLIPScoreやHPSv2のような人間の推奨品質メトリクスとを一致させる。結果として、より正確に学習された埋め込み空間は人間の判断を反映する。条件付き変分モジュールも導入しています。このアプローチは階層的トークン生成プロセスにおいて制御されたランダム性の側面を取り入れている。この能力により、モデルはあいまいなプロンプトやオープンなプロンプトに基づいて、多様なコヒーレントなイメージを生成できる。我々はCOCOと新たに開発されたAmbiguous-Prompt Benchmarkを用いて広範囲に実験を行った。その結果,MAR-MAERは,距離整合性とセマンティックフレキシビリティの両面で優れた性能を発揮することがわかった。これはベースラインのHi-MARモデルの性能を超え、CLIPScoreでは+1.6、HPSv2では+5.3の改善が見られた。不明瞭な入力に対して、特に広い範囲の出力を生成する。これらの結果は人的評価と自動測定によって確認されている。

論文の概要: MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generation

関連論文リスト