Fugu-MT 論文翻訳(概要): Interpretable Evaluation of AI-Generated Content with Language-Grounded Sparse Encoders

論文の概要: Interpretable Evaluation of AI-Generated Content with Language-Grounded Sparse Encoders

arxiv url: http://arxiv.org/abs/2508.18236v1
Date: Wed, 20 Aug 2025 06:50:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.891423
Title: Interpretable Evaluation of AI-Generated Content with Language-Grounded Sparse Encoders
Title（参考訳）: 言語付きスパースエンコーダを用いたAI生成コンテンツの解釈可能評価
Authors: Yiming Tang, Arash Lagzian, Srinivas Anumasa, Qiran Zou, Trang Nguyen, Ehsan Adeli, Ching-Yu Cheng, Yilun Du, Dianbo Liu,
Abstract要約: Language-Grounded Sparses (LanSE)は、解釈可能な評価指標を作成する新しいアーキテクチャである。 LanSEは、生成品質、プロンプトマッチ、ビジュアルリアリズム、物理的妥当性、コンテンツ多様性の4つの重要な次元を定量化する、きめ細かい評価フレームワークを提供する。 LanSEは、解釈可能性と実用的な評価ニーズをブリッジすることによって、生成AIモデルのすべてのユーザに、モデル選択、合成コンテンツの品質管理、モデル改善のための強力なツールを提供する。
参考スコア（独自算出の注目度）: 46.53980721417588
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the quality of AI-generated contents, such as synthetic images, has become remarkably high, current evaluation metrics provide only coarse-grained assessments, failing to identify specific strengths and weaknesses that researchers and practitioners need for model selection and development, further limiting the scientific understanding and commercial deployment of these generative models. To address this, we introduce Language-Grounded Sparse Encoders (LanSE), a novel architecture that creates interpretable evaluation metrics by identifying interpretable visual patterns and automatically describing them in natural language. Through large-scale human evaluation (more than 11,000 annotations) and large multimodal model (LMM) based analysis, LanSE demonstrates reliable capabilities to detect interpretable visual patterns in synthetic images with more than 93\% accuracy in natural images. LanSE further provides a fine-grained evaluation framework that quantifies four key dimensions of generation quality, prompt match, visual realism, physical plausibility, and content diversity. LanSE reveals nuanced model differences invisible to existing metrics, for instance, FLUX's superior physical plausibility and SDXL-medium's strong content diversity, while aligning with human judgments. By bridging interpretability with practical evaluation needs, LanSE offers all users of generative AI models a powerful tool for model selection, quality control of synthetic content, and model improvement. These capabilities directly address the need for public confidence and safety in AI-generated content, both critical for the future of generative AI applications.
Abstract（参考訳）: 合成画像などのAI生成コンテンツの品質は著しく高まっているが、現在の評価指標は粗い評価のみを提供し、研究者や実践者がモデル選択と開発に必要とする具体的な強度と弱点を特定しず、これらの生成モデルの科学的理解と商業的展開を制限している。この問題を解決するためにLanguage-Grounded Sparse Encoders (LanSE)を導入する。Language-Grounded Sparse Encodersは、解釈可能な視覚パターンを特定し、それらを自然言語で自動的に記述することで、解釈可能な評価指標を作成する新しいアーキテクチャである。大規模な人による評価(11,000以上のアノテーション)と大規模マルチモーダルモデル(LMM)に基づく解析を通じて、LanSEは、自然画像の93%以上の精度で合成画像の解釈可能な視覚パターンを検出する信頼性を示す。 LanSEはさらに、生成品質、プロンプトマッチ、ビジュアルリアリズム、物理的妥当性、コンテンツ多様性の4つの重要な次元を定量化する、きめ細かい評価フレームワークを提供している。 LanSEは、FLUXの優れた物理的可視性やSDXL-mediumの強い内容の多様性など、既存の指標に見えないニュアンスドモデルの違いを明らかにしている。 LanSEは、解釈可能性と実用的な評価ニーズをブリッジすることによって、生成AIモデルのすべてのユーザに、モデル選択、合成コンテンツの品質管理、モデル改善のための強力なツールを提供する。これらの機能は、AI生成コンテンツにおける公衆の信頼と安全の必要性に直接対処する。

論文の概要: Interpretable Evaluation of AI-Generated Content with Language-Grounded Sparse Encoders

関連論文リスト