Fugu-MT 論文翻訳(概要): Frozen Foundation-Model Embeddings Discard Small-Lesion Signal in Chest Radiography: Implications for Pre-Deployment Evaluation

論文の概要: Frozen Foundation-Model Embeddings Discard Small-Lesion Signal in Chest Radiography: Implications for Pre-Deployment Evaluation

arxiv url: http://arxiv.org/abs/2606.11606v1
Date: Wed, 10 Jun 2026 03:06:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.262421
Title: Frozen Foundation-Model Embeddings Discard Small-Lesion Signal in Chest Radiography: Implications for Pre-Deployment Evaluation
Title（参考訳）: 胸部X線写真における微小損傷信号の凍結基礎モデル埋め込み : プリデプロイ評価における意義
Authors: Raajitha Muthyala, Zhenan Yin, Alekhya Jilla, Frank Li, Theo Dapamede, Bardia Khosravi, Mohammadreza Chavoshi, Judy Gichoya, Saptarshi Purkayastha,
Abstract要約: 我々は5つの凍結したViTと3つの大きなCXRコホートにわたるResNet-50アーキテクチャ制御を調査した。各模型は, 実際の病変に対して, 小型の摂動パネルと領域認識型境界箱成層プローブを用いて評価した。パネル上では、CLSの埋め込みはチャンスフロア(ROC曲線[AUC]0.500-0524の範囲)に置かれ、パッチ平均は、異色および細い細胞上のCRSと区別できない。 ChestX-Det10では、画像レベルのCLS分類はクラス内小逆数大を示した
参考スコア（独自算出の注目度）: 4.181595462856913
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Frozen vision-transformer (ViT) foundation-model embeddings increasingly serve as the substrate for downstream chest-radiography (CXR) pipelines, yet where small-scale, low-contrast signal is retained or lost in the frozen forward pass has not been systematically quantified across architectures, pretraining domains, and objectives. We probed five frozen ViTs (RAD-DINO, DINOv2-B/14, DINOv3 ViT-7B, BiomedCLIP, MedSigLIP) and a frozen DINO-pretrained ResNet-50 architectural control across three large CXR cohorts (NIH-CXR14, MIMIC-CXR, Emory-CXR; aggregate pool n=492,724) and ChestX-Det10 (n=3,543; 1,462 small-lesion bounding boxes across Calcification, Nodule, Mass). Each model was evaluated with a small-scale-perturbation panel and a region-aware bounding-box-stratified probe on real lesions, comparing three pooling modes from the same forward pass: classification token (CLS), patch-mean (mean over all final-layer patch tokens), and bounding-box-restricted patch-local. On the perturbation panel, CLS embeddings sat at the chance floor (area under the ROC curve [AUC] 0.500-0.524); patch-mean was indistinguishable from CLS on iso-blur and reticular-fine cells but rose with CLS on larger directional-blur footprints, while disease AUC on globally decided tasks ranged 0.642-0.913. Patch-local probes recovered AUC ~1.0 from the same forward pass (per-model mean improvement +0.412 to +0.488); the ResNet-50 control reproduced the chance floor. On ChestX-Det10, image-level CLS classification showed within-class small-versus-large stratum gaps up to +0.243 AUC; bounding-box-level patch-local pooling on the same forward pass recovered AUC >= 0.899 on every (model x class) cell. Frozen ViT embeddings silently suppress small-scale signal at the global-aggregation step; the signal is recoverable from patch tokens conditional on a region of interest.
Abstract（参考訳）: 凍結視覚変換器 (ViT) の基礎モデル埋め込みは, 下流胸部XRパイプラインの基盤としての役割を担っているが, 凍結前部パスにおいて小型の低コントラスト信号が保持または失われた場合, アーキテクチャ, 事前訓練領域, 目的物間で体系的に定量化されていない。 5つの凍結ViT(RAD-DINO, DINOv2-B/14, DINOv3 ViT-7B, BiomedCLIP, MedSigLIP)と3つの大きなCXRコホート(NIH-CXR14, MIMIC-CXR, Emory-CXR; 集約プールn=492,724)とChestX-Det10(n=3,543; 1,462個の小さな分割バウンディングボックス)にわたる凍結DINO-Pretrained ResNet-50アーキテクチャ制御を探索した。各モデルを実病変に対する小さな摂動パネルと領域対応のバウンディングボックス成層プローブを用いて評価し、同じ前方パスから3つのプールモード、すなわち分類トークン(CLS)、パッチ平均(すべての最終層パッチトークンに対する平均値)、およびバウンディングボックス制限パッチローカルを比較した。摂動パネルでは、CLS埋め込みはチャンスフロア(ROC曲線[AUC] 0.500-0.524)に置かれ、パッチ平均は、CLSのイソブルーおよびレチキュラーファインセルでは区別できないが、CLSがより大きな方向と青色のフットプリントで上昇したのに対して、AUCのグローバルに決定されたタスクは0.642-0.913であった。パッチローカルプローブは同じ前方通過からAUC ~1.0(モデル毎の平均改善+0.412から+0.488)を回収し、ResNet-50制御はチャンスフロアを再現した。 ChestX-Det10では,画像レベルのCLS分類では,各(モデルxクラス)細胞において,同じ前方パス上のバウンディングボックスレベルのパッチローカルプールがAUC >=0.899で回収された。凍結したViT埋め込みは、グローバル集約ステップで小さなシグナルを静かに抑制し、興味のある領域で条件付けられたパッチトークンから信号を回復することができる。

論文の概要: Frozen Foundation-Model Embeddings Discard Small-Lesion Signal in Chest Radiography: Implications for Pre-Deployment Evaluation

関連論文リスト