Fugu-MT 論文翻訳(概要): TIQA: Human-Aligned Text Quality Assessment in Generated Images

論文の概要: TIQA: Human-Aligned Text Quality Assessment in Generated Images

arxiv url: http://arxiv.org/abs/2603.07119v1
Date: Sat, 07 Mar 2026 09:11:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.889492
Title: TIQA: Human-Aligned Text Quality Assessment in Generated Images
Title（参考訳）: TIQA:生成した画像のテキスト品質評価
Authors: Kirill Koltsov, Aleksandr Gushchin, Dmitriy Vatolin, Anastasia Antsiferova,
Abstract要約: テキスト品質評価(TIQA)は,収穫されたテキスト領域内の描画テキストの忠実度を人間の判断に合わせるスカラー品質スコアを予測するタスクである。例えば、アンチQAを用いてベスト・オブ・5世代を選択すると、人文品質が平均で+14%向上する。
参考スコア（独自算出の注目度）: 42.874268801024066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text rendering remains a persistent failure mode of modern text-to-image models (T2I), yet existing evaluations rely on OCR correctness or VLM-based judging procedures that are poorly aligned with perceptual text artifacts. We introduce Text-in-Image Quality Assessment (TIQA), a task that predicts a scalar quality score that matches human judgments of rendered-text fidelity within cropped text regions. We release two MOS-labeled datasets: TIQA-Crops (10k text crops) and TIQA-Images (1,500 images), spanning 20+ T2I models, including proprietary ones. We also propose ANTIQA, a lightweight method with text-specific biases, and show that it improves correlation with human scores over OCR confidence, VLM judges, and generic NR-IQA metrics by at least $\sim0.05$ on TIQA-Crops and $\sim0.08$ on TIQA-Images, as measured by PLCC. Finally, we show that TIQA models are valuable in downstream tasks: for example, selecting the best-of-5 generations with ANTIQA improves human-rated text quality by $+14\%$ on average, demonstrating practical value for filtering and reranking in generation pipelines.
Abstract（参考訳）: テキストレンダリングは現代のテキスト・ツー・イメージ・モデル(T2I)の永続的な障害モードであり続けているが、既存の評価はOCRの正確性やVLMベースの判断手順に依存しており、知覚的なテキストアーティファクトと整合性が低い。テキスト品質評価(TIQA)は,収穫されたテキスト領域内の描画テキストの忠実度を人間の判断に合わせるスカラー品質スコアを予測するタスクである。 TIQA-Crops (10kテキスト作物)とTIQA-Images (1500イメージ)の2つのMOSラベル付きデータセットをリリースし、プロプライエタリなものを含む20以上のT2Iモデルにまたがる。また,テキスト固有のバイアスを持つ軽量な手法であるAntiQAを提案し,PLCCが測定した,少なくともTIQA-Cropsでは$\sim0.05$,TIQA-Imageでは$\sim0.08$でOCR信頼度,VLM判断,NR-IQA測定値との相関性を改善することを示す。最後に、TIQAモデルがダウンストリームタスクにおいて有用であることを示す。例えば、AntiQAでベスト5世代を選択すると、平均$+14\%のテキスト品質が向上し、生成パイプラインにおけるフィルタリングと再ランク付けの実用的な価値が示される。

論文の概要: TIQA: Human-Aligned Text Quality Assessment in Generated Images

関連論文リスト