Fugu-MT 論文翻訳(概要): Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models

論文の概要: Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models

arxiv url: http://arxiv.org/abs/2605.08249v1
Date: Thu, 07 May 2026 15:32:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:49.497403
Title: Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models
Title（参考訳）: 凍結視覚基礎モデルにおける表現整合性の次元コアクティベーション
Authors: Izaldein Al-Zyoud Abdulmotaleb El Saddik,
Abstract要約: 本研究は,凍結基礎モデルが意味領域をまたいだ1つのサンプルを連続的に表現するかどうかを考察する。本稿では,このコヒーレンスを測定するための次元別コヒーレンス(DCA)について紹介する。 DCAは安定な1次元座標系に依存し、領域抽出のみに依存しない。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Frozen vision foundation models do not merely extract features; they organize images through a learned coordinate system. We ask whether that coordinate system remains internally coherent within a single input. This leads to Representational Consistency: the study of whether a frozen foundation model represents one sample coherently across its semantic subregions. We introduce Dimensional Coactivation (DCA), a per-dimension instrument for measuring this coherence. DCA compares semantic regions by asking whether the same feature dimensions coactivate across them. Unlike classical similarity measures, it deliberately avoids centering, L2 normalization, and full Gram coupling. These operations are useful when comparing different models or distributions, but they are mismatched to the intra-sample setting, where the coordinate system is fixed and raw magnitude carries signal. Deepfake detection provides a natural validation task. Synthetic faces may reproduce plausible eyes, noses, and mouths while breaking the representational structure that links those regions in real faces. Using frozen DINOv3 features, DCA exposes this break: an eyes-mouth-nose fingerprint achieves 0.9106 AUC on CelebDF-v2 and 0.9289 on DFD under FF++ c23 cross-dataset transfer. The design is also sharply validated by ablation: reintroducing centering collapses CelebDF-v2 AUC to 0.459, L2 normalization reduces it to 0.862, and cross-dimension coupling reduces it to 0.478. Finally, replacing DINOv3 with FaRL collapses CelebDF-v2 AUC to 0.582. DCA therefore depends on a stable per-dimension coordinate system, not on region extraction alone. These results position DCA as an instrument for measuring intra-sample representational coherence in frozen foundation models, with deepfake detection as the first validation task.
Abstract（参考訳）: 凍結視覚基盤モデルは単に特徴を抽出するだけでなく、学習された座標系を通じて画像を整理する。我々は、その座標系が単一の入力内で内部的に一貫性を保つかどうかを問う。これは表現整合性(Representational Consistency): 凍結基礎モデルがその意味的な部分領域をまたいで1つの標本を一貫性を持って表現するかどうかの研究である。本稿では,このコヒーレンスを測定するための次元別コヒーレンス(DCA)について紹介する。 DCAは、同じ特徴次元がそれらの間でコアクティベートするかどうかを問うことによって意味領域を比較する。古典的な類似度測度とは異なり、中心化、L2正規化、フルグラム結合を意図的に避ける。これらの操作は、異なるモデルや分布を比較する際に有用であるが、座標系が固定され、生の等級が信号を運ぶサンプル内設定と不一致である。ディープフェイク検出は自然な検証タスクを提供する。合成顔は、実際の顔のこれらの領域を繋ぐ表現構造を破りながら、可視眼、鼻、口を再現することができる。目と鼻の指紋は、CelebDF-v2で0.9106 AUC、FF++ c23のクロスデータセット転送でDFDで0.9289AUCを達成する。中心崩壊の再導入 CelebDF-v2 AUC を 0.459 に、L2 の正規化は 0.862 に、クロス次元結合は 0.478 に減少する。最後に、DINOv3をFaRLに置き換えると、CelebDF-v2 AUCは0.582に崩壊する。したがって、DCAは領域抽出のみではなく、安定な1次元座標系に依存している。これらの結果から, DCAを凍結基礎モデルにおけるサンプル内表現コヒーレンス測定の手段として位置づけ, ディープフェイク検出を第1の検証課題とした。

論文の概要: Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models

関連論文リスト