Fugu-MT 論文翻訳(概要): When Style Similarity Scores Fail: Diagnosing Raw CSD Cosine in Artist-Style Evaluation

論文の概要: When Style Similarity Scores Fail: Diagnosing Raw CSD Cosine in Artist-Style Evaluation

arxiv url: http://arxiv.org/abs/2605.09030v1
Date: Sat, 09 May 2026 16:15:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.030906
Title: When Style Similarity Scores Fail: Diagnosing Raw CSD Cosine in Artist-Style Evaluation
Title（参考訳）: アーティスト・スタイル評価におけるCSDコサインの診断
Authors: Jörg Frochte,
Abstract要約: 本稿では,コーパス内,プロトタイプフリー,しきい値フリーの診断を行う識別ギャップについて紹介する。 1799年のアートワーク、91アーティストのパブリックドメインコーパスでは、生のCSDコサインは2つのレベルで23/91ドルのアーティストに対して負のポイント推定ギャップを生じる。 CLIP-ViT-L/14、SigLIP-large、DINOv2-Largeのクロスバックチェックは、同じ共有トラフィック障害パターンを再現する。
参考スコア（独自算出の注目度）: 0.1923695645342299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Raw cosine in the 768-dimensional output space of the Contrastive Style Descriptor (CSD) is now widely read as an absolute, calibrated style-fidelity score for text-to-image and style-imitation evaluation. We introduce the discrimination gap, a corpus-internal, prototype-free and threshold-free diagnostic that tests whether contrastive style cosines admit an absolute same-versus-different interpretation on a candidate artist corpus. On a 1799-artwork, 91-artist public-domain corpus, raw CSD cosine yields negative point-estimate gaps for $23/91$ artists at the pairwise level ($2/91$ robust under bootstrap) and for $15/91$ in the aggregated-pool scoring regime style-fidelity evaluations typically use. CSLS readout on the frozen backbone reduces the aggregated negative-gap count to $4/91$; combined with positional-embedding interpolation to $336$ pixels it raises unsupervised pair-verification AUC from $0.883$ to $0.905$ across $25$ artist-disjoint splits. We refer to this diagnostic-driven readout protocol on the frozen backbone (CSLS as default, pos-interp $336$ as the stronger optional setting) as CSD+, not a new encoder.A cross-backbone check on CLIP-ViT-L/14, SigLIP-large and DINOv2-Large reproduces the same shared-tradition failure pattern, providing evidence that the residual reflects a shared limitation of the four backbones we tested rather than a CSD-specific artefact. Practical implication: before reporting CSD cosine as an absolute style-fidelity score, run the diagnostic on the candidate corpus; CSLS is the minimal correction when it fails.
Abstract（参考訳）: Contrastive Style Descriptor (CSD) の768次元出力空間における原コサインは、テキスト・ツー・イモージョンとスタイル・イミテーション評価のための絶対的、校正されたスタイル・フィデリティスコアとして広く読まれている。コントラスト型コサインが絶対対向差の解釈を許容するかどうかを検査する,コーパス内在型,プロトタイプフリー型,しきい値フリーな診断法である識別ギャップを導入する。 1799年のアートワークでは、91アートのパブリックドメインコーパスである生のCSDコサインは、ペアレベルで23/91ドルのアーティスト(ブートストラップで2/91ドルのロバスト)と15/91ドルのアグリッドプールスコアリング方式のスタイルフィデリティ評価において、負のポイント推定ギャップを生じる。凍結したバックボーン上のCSLS読み出しは、集約された負のギャップ数を4/91$に減らし、位置埋め込みの補間を336ドルのピクセルに組み合わせることで、教師なしのペア検証AUCを25ドルの分割で0.883$から0.905$に引き上げる。 CSD+は新たなエンコーダではない。CLIP-ViT-L/14, SigLIP-large, DINOv2-Largeのクロスバックチェックでは、同じ共有トラフィック障害パターンを再現し、残余がCSD固有のアーティファクトではなくテストした4つのバックボーンの共有制限を反映していることを示す。実用上の意味: CSDコサインを絶対的なスタイル忠実度スコアとして報告する前に、候補コーパスで診断を実行する。

論文の概要: When Style Similarity Scores Fail: Diagnosing Raw CSD Cosine in Artist-Style Evaluation

関連論文リスト