Fugu-MT 論文翻訳(概要): MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

論文の概要: MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

arxiv url: http://arxiv.org/abs/2301.02228v1
Date: Thu, 5 Jan 2023 18:55:09 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-06 13:31:51.356959
Title: MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training
Title（参考訳）: MedKLIP: 医学的知識による言語画像の事前学習
Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
Abstract要約: 医用知識を用いた自己指導型視覚言語事前学習の課題について考察する。第一に、生の報告を直接処理する既存の作業とは異なり、医療機関を抽出する新しいレポートフィルタを採用する。次に,外部知識記述ベースを問合せすることで,新しいエンティティ埋め込みモジュールを提案する。第3に,実体記述と視覚信号とを空間的に整合するトランスフォーマーベース融合モデルを提案する。
参考スコア（独自算出の注目度）: 40.52487429030841
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider the problem of enhancing self-supervised visual-language pre-training (VLP) with medical-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel report filter to extract the medical entities, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel entity embedding module by querying an external knowledge description base, to exploit the rich context of additional information that the medical domain affords, and implicitly build relationships between entities in the language embedding space; Third, we propose a novel Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level only with self-supervised learning, thus enabling the ability for spatial grounding; Fourth, we conduct thorough experiments to validate the effectiveness of our proposed architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.
Abstract（参考訳）: 本稿では,医用知識を活かした自己教師型視覚言語事前訓練(VLP)の課題を,放射線学的日々の実践から得られるペア画像テキストレポートを利用して検討する。 In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel report filter to extract the medical entities, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel entity embedding module by querying an external knowledge description base, to exploit the rich context of additional information that the medical domain affords, and implicitly build relationships between entities in the language embedding space; Third, we propose a novel Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level only with self-supervised learning, thus enabling the ability for spatial grounding; Fourth, we conduct thorough experiments to validate the effectiveness of our proposed architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. ゼロショットと微調整の両方において,従来の疾患分類法や接地法と比較して高い性能を示した。

関連論文リスト

Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation [13.580272788409092]
BoxMed-RLは、空間的に検証可能な説明可能な放射線学レポートを生成するための、画期的な統合トレーニングフレームワークである。大きなビジョン言語モデルに基づいて構築されたBoxMed-RLは、2つの統合フェーズを通じてレポート生成に革命をもたらす。 BoxMed-RLは、最先端の手法と比較して、METEORとROUGE-Lの両方で平均7%改善されている。
論文参考訳（メタデータ） (2025-04-25T16:05:06Z)
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA [3.1001390303501153]
Abn-BLIPは放射線診断の精度と包括性を生成するために異常所見の整合を図った高度な診断モデルである。以上の結果から,Abn-BLIPは最先端の医療ビジョン言語モデルおよび3Dレポート生成手法よりも精度および臨床関連性が高いことがわかった。
論文参考訳（メタデータ） (2025-03-03T20:13:39Z)
RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization [1.8450534779202723]
本研究では,抽象的放射線学レポート要約のためのドメイン固有かつ容易なBARTモデルの適応であるRadBARTsumを提案する。本手法は,1)生物医学領域の知識学習を改善するための新しい実体マスキング戦略を用いて,放射線学報告の大規模コーパス上でBARTモデルを再学習すること,2)印象区間を予測するためにFindersとバックグラウンドセクションを用いて要約タスクのモデルを微調整すること,の2つの段階を含む。
論文参考訳（メタデータ） (2024-06-05T08:43:11Z)
Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray [12.239249676716247]
医用視覚言語プレトレーニングは、医用画像とテキストのドメイン汎用表現を学習するための有望なアプローチとして現れてきた。胸部X線に対する知識強化型医療ビジョン言語事前学習フレームワークを提案する。以上の結果から,胸部X線像とX線像との整合性を改善するために接地機構を組み込むことの利点が示唆された。
論文参考訳（メタデータ） (2024-04-23T05:16:24Z)
MedRG: Medical Report Grounding with Multi-modal Large Language Model [42.04042642085121]
Medical Report Grounding (MedRG)は、キーフレーズを予測するためにマルチモーダルな大規模言語モデルを利用するエンドツーエンドのソリューションである。 MedRGの有効性を実証し,既存の医療用語の接頭法の性能を上回り,その効果を検証した。
論文参考訳（メタデータ） (2024-04-10T07:41:35Z)
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [1.6567372257085946]
胸部X線に対する視覚言語による事前訓練は、主にペアのX線写真とラジオグラフィーレポートを活用することで大きな進歩を遂げた。オープンウェブからの無線画像記述を利用するトランスフォーマーベースのDeViDeを提案する。 DeViDeは知識強化された視覚言語アライメントの3つの重要な特徴を取り入れている。ゼロショット設定では、DeViDeは外部データセットの完全な教師付きモデルと互換性があり、3つの大規模データセットの最先端結果を達成する。
論文参考訳（メタデータ） (2024-04-04T17:40:06Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
ラベル付き医用画像-レポートペアの不足は、ディープニューラルネットワークや大規模ニューラルネットワークの開発において大きな課題となっている。本稿では,コンピュータビジョンと自然言語処理の基盤モデル (FM) として,市販の汎用大規模事前学習モデルのカスタマイズを提案する。
論文参考訳（メタデータ） (2023-06-09T03:02:36Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
本稿では,視覚および言語入力を共通空間に埋め込んだ医用視覚言語モデル(VLM)について検討する。本稿では,新しい画像領域やテキスト領域への汎用事前学習モデルの適用など,低データ性能向上のためのいくつかの候補手法について検討する。テキスト・ツー・イメージ検索をベンチマークとして,2つの胸部X線および放射線学的報告を用いた可変サイズのトレーニングデータセットを用いて,これらの手法の性能評価を行った。
論文参考訳（メタデータ） (2023-03-30T18:20:00Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
我々は、Show-Attend-Tell と GPT-3 という2つの言語モデルを組み合わせて、包括的で記述的な放射線学記録を生成する。提案モデルは、Open-I、MIMIC-CXR、汎用MS-COCOの2つの医療データセットで検証される。
論文参考訳（メタデータ） (2022-09-28T10:27:10Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
眼科報告生成(ORG)のためのクロスモーダルな臨床グラフ変換器(CGT)を提案する。 CGTは、デコード手順を駆動する事前知識として、臨床関係を視覚特徴に注入する。大規模FFA-IRベンチマークの実験は、提案したCGTが従来のベンチマーク手法より優れていることを示した。
論文参考訳（メタデータ） (2022-06-04T13:16:30Z)
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment [27.111857943935725]
胸部X線からのレポート生成のための自動マルチモーダルアプローチを提案する。本手法は,学習知識ベースとマルチモーダルアライメントの2つの異なるモジュールを特徴とする。両モジュールの助けを借りて、我々のアプローチは明らかに最先端の手法よりも優れている。
論文参考訳（メタデータ） (2021-12-30T10:43:56Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
放射線技師の動作パターンを模倣する補助信号誘導知識デコーダ(ASGK)を提案する。 ASGKは、内的特徴融合と外部医療言語情報を統合して、医療知識の伝達と学習をガイドする。
論文参考訳（メタデータ） (2020-06-06T01:00:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。