Fugu-MT 論文翻訳(概要): Translating Inference-Time Control to Radiology Vision-Language Models: Activation Steering for Pneumonia Classification on Chest X-rays

論文の概要: Translating Inference-Time Control to Radiology Vision-Language Models: Activation Steering for Pneumonia Classification on Chest X-rays

arxiv url: http://arxiv.org/abs/2606.20852v1
Date: Thu, 18 Jun 2026 18:36:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 12:33:36.008094
Title: Translating Inference-Time Control to Radiology Vision-Language Models: Activation Steering for Pneumonia Classification on Chest X-rays
Title（参考訳）: 放射線ビジョン言語モデルへの推論時間制御:胸部X線上の肺炎分類のための活性化ステアリング
Authors: Eduardo Moreno Judice de Mattos Farina, Mateus A. Esmeraldo, Felipe Akio Matsuoka, Paulo Eduardo de Aguiar Kuriki, Felipe Campos Kitamura,
Abstract要約: 造影活性付加(CAA)は胸部X線写真VLMの肺炎分類をモデル重量を更新することなく改善することができる。 3本の凍結胸部X線写真がKermany 肺炎検体で評価された。 CAAは微調整なしで予測スコア分布と動作特性を実質的に変化させた。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inference-time engineering can alter model behavior without fine-tuning. However, its utility for improving diagnostic performance in medical vision-language models (VLMs) remains unclear. We aim to evaluate whether Contrastive Activation Addition (CAA) can improve pneumonia classification in chest radiograph VLMs without updating model weights. Three frozen chest radiograph VLMs (MedGemma-4B-IT, NV-Reason-CXR-3B, and CheXOne-3B) were evaluated on the public Kermany pneumonia test set. Classification was based on the logits of the tokens Yes and No under a binary prompt. Steering vectors included a 30-pair answer-bias control, a 30-pair pneumonia text contrast, and an image-conditioned contrast derived from 30 pneumonia and 30 normal development images. A deterministic 200-image development set was used for layer and scale selection (100 images) and threshold calibration (100 images). Performance was assessed using ROC-AUC, PR-AUC, F1 score, threshold analyses, reverse-vector controls, random-vector controls, and conditional bootstrap confidence intervals. Fixed-threshold F1 improvements were frequently observed but did not consistently indicate improved diagnostic performance. For MedGemma-4B-IT. NV-Reason-CXR-3B showed the strongest benefit: calibrated F1 improved from 0.7692 in the zero-shot setting to 0.8619 with pneumonia-text steering and to 0.8727 with image-conditioned steering. For CheXOne-3B, pneumonia-text steering increased calibrated F1 from 0.8528 to 0.8666, although the confidence interval crossed zero. On this public pneumonia benchmark, CAA substantially altered prediction score distributions and operating characteristics without fine-tuning. Meaningful performance gains were observed in one of three evaluated VLMs, suggesting that activation steering may serve as a lightweight approach for adapting medical VLM behavior.
Abstract（参考訳）: 推論時のエンジニアリングは微調整なしでモデルの振る舞いを変えることができる。しかし、医用視覚言語モデル(VLM)の診断性能を向上させるための実用性は未だ不明である。本研究の目的は, 胸部X線写真VLMにおいて, モデル重量を更新することなく, コントラスト活性付加(CAA)が肺炎の分類を改善できるかどうかを評価することである。 3種類の凍結胸部X線写真 (MedGemma-4B-IT, NV-Reason-CXR-3B, CheXOne-3B) をKermany 肺炎検体で評価した。分類はトークンのロジットに基づいており、YesとNoはバイナリプロンプトの下に置かれる。ステアリングベクターには、30対の回答バイアスコントロール、30対の肺炎テキストコントラスト、30対の肺炎および30対の正常な発達画像から得られる画像条件コントラストが含まれていた。階層化とスケール選択(100画像)としきい値のキャリブレーション(100画像)には,決定論的200イメージの開発セットが使用された。 ROC-AUC,PR-AUC,F1スコア,しきい値解析,リバースベクター制御,ランダムベクター制御,条件付きブートストラップ信頼区間を用いて評価を行った。固定閾値F1の改善はよく見られたが, 常に診断成績は改善しなかった。 MedGemma-4B-IT用。 NV-Reason-CXR-3Bは、ゼロショット設定で0.7692から0.8619に改善され、肺炎テキストステアリングで0.8727に改善された。 CheXOne-3Bでは、信頼区間は0.8528から0.8666に変化したが、肺炎のテキストステアリングは校正されたF1を0.8528から0.8666に増加させた。この公的肺炎ベンチマークでは,CAAは微調整なしで予測スコア分布と手術特性を著しく変化させた。 3つの評価されたVLMのうちの1つに有意な性能向上が観察され、活性化ステアリングが医療用VLMの動作に適応するための軽量なアプローチである可能性が示唆された。

論文の概要: Translating Inference-Time Control to Radiology Vision-Language Models: Activation Steering for Pneumonia Classification on Chest X-rays

関連論文リスト