Fugu-MT 論文翻訳(概要): Vision Language Model Helps Private Information De-Identification in Vision Data

論文の概要: Vision Language Model Helps Private Information De-Identification in Vision Data

arxiv url: http://arxiv.org/abs/2606.09132v1
Date: Mon, 08 Jun 2026 07:30:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.800597
Title: Vision Language Model Helps Private Information De-Identification in Vision Data
Title（参考訳）: Vision Language Modelは、視覚データにおける個人情報の識別を支援する
Authors: Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei,
Abstract要約: VisShieldは、ビジュアル言語モデル(VLM)のプライバシー意識を高めるために設計されたエンドツーエンドフレームワークである。我々のフレームワークは2つの重要なコンポーネントで構成されている。我々のアプローチは、VLMがプライバシーに敏感なテキストを認識し、検出されたエンティティに対して正確なバウンディングボックスを出力することを保証する。
参考スコア（独自算出の注目度）: 55.425628316813174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Language Models (VLMs) have gained significant popularity due to their remarkable ability. While various methods exist to enhance privacy in text-based applications, privacy risks associated with visual inputs remain largely overlooked such as Protected Health Information (PHI) in medical images. To tackle this problem, two key tasks: accurately localizing sensitive text and processing it to ensure privacy protection should be performed. To address this issue, we introduce VisShield (Vision Privacy Shield), an end-to-end framework designed to enhance the privacy awareness of VLMs. Our framework consists of two key components: a specialized instruction-tuning dataset OPTIC (Optical Privacy Text Instruction Collection) and a tailored training methodology. The dataset provides diverse privacy-oriented prompts that guide VLMs to perform targeted Optical Character Recognition (OCR) for precise localization of sensitive text, while the training strategy ensures effective adaptation of VLMs to privacy-preserving tasks. Specifically, our approach ensures that VLMs recognize privacy-sensitive text and output precise bounding boxes for detected entities, allowing for effective masking of sensitive information. Extensive experiments demonstrate that our framework significantly outperforms existing approaches in handling private information, paving the way for privacy-preserving applications in vision-language models. Our dataset and code can be found here.
Abstract（参考訳）: 視覚言語モデル(VLM)は、その顕著な能力によって大きな人気を集めている。テキストベースのアプリケーションにおけるプライバシーを強化する様々な方法が存在するが、医用画像における保護健康情報(PHI)など、視覚入力に関連するプライバシーリスクはほとんど見過ごされている。この問題に対処するためには、機密テキストを正確にローカライズし、プライバシー保護を行うために処理する2つの重要なタスクがある。この問題に対処するために、VLMのプライバシー意識を高めるために設計されたエンドツーエンドフレームワークであるVisShield(Vision Privacy Shield)を紹介する。フレームワークは2つの重要なコンポーネントで構成されている。命令チューニングデータセットOPTIC(Optical Privacy Text Instruction Collection)と、カスタマイズされたトレーニング方法論である。このデータセットは、プライバシー指向の多様なプロンプトを提供し、機密テキストの正確なローカライゼーションのためにターゲットの光学文字認識(OCR)を実行するようにVLMを誘導する一方で、トレーニング戦略は、プライバシー保護タスクへのVLMの効果的な適応を保証する。具体的には、VLMがプライバシーに敏感なテキストを認識し、検出されたエンティティの正確なバウンディングボックスを出力し、センシティブな情報の効果的なマスキングを可能にする。大規模な実験により、我々のフレームワークは、視覚言語モデルにおけるプライバシ保護アプリケーションへの道を切り開いて、プライベート情報を扱う既存のアプローチを著しく上回ります。データセットとコードはここにある。

論文の概要: Vision Language Model Helps Private Information De-Identification in Vision Data

関連論文リスト