Fugu-MT 論文翻訳(概要): A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

論文の概要: A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

arxiv url: http://arxiv.org/abs/2604.07128v1
Date: Wed, 08 Apr 2026 14:21:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.579843
Title: A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing
Title（参考訳）: クロスホスピタルラジオロジーデータ共有のための実用性保存型識別パイプライン
Authors: Chenhao Liu, Zelin Wen, Yan Tong, Junjie Zhu, Xinyu Tian, Yuchi Liu, Ashu Gupta, Syed M. S. Islam, Tom Gedeon, Yue Yao,
Abstract要約: 本稿では,病院間放射線学データ共有のためのユーティリティ保存型復号化パイプラインを提案する。放射線画像には、プライバシフィルタと病理保存保存された元の画像の合成を行う生成的フィルタリング機構を用いる。胸部X線検査実験により,診断に関連のある病態を保存しながら,プライバシに敏感な情報を効果的に除去できることが示唆された。
参考スコア（独自算出の注目度）: 26.834444576458623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale radiology data are critical for developing robust medical AI systems. However, sharing such data across hospitals remains heavily constrained by privacy concerns. Existing de-identification research in radiology mainly focus on removing identifiable information to enable compliant data release. Yet whether de-identified radiology data can still preserve sufficient utility for large-scale vision-language model training and cross-hospital transfer remains underexplored. In this paper, we introduce a utility-preserving de-identification pipeline (UPDP) for cross-hospital radiology data sharing. Specifically, we compile a blacklist of privacy-sensitive terms and a whitelist of pathology-related terms. For radiology images, we use a generative filtering mechanism that synthesis a privacy-filtered and pathology-reserved counterparts of the original images. These synthetic image counterparts, together with ID-filtered reports, can then be securely shared across hospitals for downstream model development and evaluation. Experiments on public chest X-ray benchmarks demonstrate that our method effectively removes privacy-sensitive information while preserving diagnostically relevant pathology cues. Models trained on the de-identified data maintain competitive diagnostic accuracy compared with those trained on the original data, while exhibiting a marked decline in identity-related accuracy, confirming effective privacy protection. In the cross-hospital setting, we further show that de-identified data can be combined with local data to yield better performance.
Abstract（参考訳）: 大規模放射線学データは、堅牢な医療AIシステムの開発に不可欠である。しかし、これらのデータを病院間で共有することは、プライバシーの懸念に強く制約されている。放射線学における既存の非識別研究は、主に、適合データのリリースを可能にする識別可能な情報を削除することに焦点を当てている。しかし、未同定の放射線学データが大規模視覚言語モデルの訓練に十分な有用性を保てるかどうかは未定のままである。本稿では,病院間無線データ共有のためのユーティリティ保存型識別パイプライン(UPDP)を提案する。具体的には、プライバシに敏感な用語のブラックリストと、病理に関する用語のホワイトリストをコンパイルする。放射線画像には、プライバシフィルタと病理保存を併用した生成的フィルタリング機構を用いる。これらの合成画像は、IDフィルタリングされたレポートとともに、下流モデルの開発と評価のために病院間で安全に共有することができる。胸部X線検査実験により,診断に関連のある病態を保存しながら,プライバシに敏感な情報を効果的に除去できることが判明した。未確認データでトレーニングされたモデルは、元のデータでトレーニングされたモデルと比較して、競合する診断精度を維持しながら、アイデンティティ関連の精度が著しく低下し、効果的なプライバシー保護が確認される。クロスホスピタル設定では、非識別データとローカルデータを組み合わせることで、より良い性能が得られることを示す。

論文の概要: A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

関連論文リスト