Fugu-MT 論文翻訳(概要): MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

論文の概要: MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

arxiv url: http://arxiv.org/abs/2301.02228v3
Date: Mon, 3 Apr 2023 09:57:51 GMT
ステータス: 翻訳完了
システム内更新日: 2023-04-04 23:22:51.326559
Title: MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology
Title（参考訳）: MedKLIP: 医学的知識を活かした言語画像による放射線診断
Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
Abstract要約: 医用医用視覚言語事前訓練を専門知識と組み合わせて行うことを検討する。まず, 生の報告を直接処理する既存の作業とは異なり, 医療関連情報を抽出するために, 新規な三重項抽出モジュールを採用する。第2に,医療分野における豊富な知識を活用するために,知識ベースを問合せすることで,エンティティ翻訳を伴う新しい三重項符号化モジュールを提案する。第3に、トランスフォーマーを用いた融合モデルを用いて、画像パッチレベルでの実体記述と視覚信号との空間的整合を図り、診断を可能にすることを提案する。
参考スコア（独自算出の注目度）: 40.52487429030841
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.
Abstract（参考訳）: 本稿では,放射線学的日々の実践から画像テキストのペアレポートを活用し,ドメイン固有知識を用いた医学的視覚言語前訓練(vlp)の強化を検討する。 In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. ゼロショットと微調整の両方において,従来の疾患分類法や接地法と比較して高い性能を示した。

関連論文リスト

Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation [13.580272788409092]
BoxMed-RLは、空間的に検証可能な説明可能な放射線学レポートを生成するための、画期的な統合トレーニングフレームワークである。大きなビジョン言語モデルに基づいて構築されたBoxMed-RLは、2つの統合フェーズを通じてレポート生成に革命をもたらす。 BoxMed-RLは、最先端の手法と比較して、METEORとROUGE-Lの両方で平均7%改善されている。
論文参考訳（メタデータ） (2025-04-25T16:05:06Z)
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA [3.1001390303501153]
Abn-BLIPは放射線診断の精度と包括性を生成するために異常所見の整合を図った高度な診断モデルである。以上の結果から,Abn-BLIPは最先端の医療ビジョン言語モデルおよび3Dレポート生成手法よりも精度および臨床関連性が高いことがわかった。
論文参考訳（メタデータ） (2025-03-03T20:13:39Z)
RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization [1.8450534779202723]
本研究では,抽象的放射線学レポート要約のためのドメイン固有かつ容易なBARTモデルの適応であるRadBARTsumを提案する。本手法は,1)生物医学領域の知識学習を改善するための新しい実体マスキング戦略を用いて,放射線学報告の大規模コーパス上でBARTモデルを再学習すること,2)印象区間を予測するためにFindersとバックグラウンドセクションを用いて要約タスクのモデルを微調整すること,の2つの段階を含む。
論文参考訳（メタデータ） (2024-06-05T08:43:11Z)
Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray [12.239249676716247]
医用視覚言語プレトレーニングは、医用画像とテキストのドメイン汎用表現を学習するための有望なアプローチとして現れてきた。胸部X線に対する知識強化型医療ビジョン言語事前学習フレームワークを提案する。以上の結果から,胸部X線像とX線像との整合性を改善するために接地機構を組み込むことの利点が示唆された。
論文参考訳（メタデータ） (2024-04-23T05:16:24Z)
MedRG: Medical Report Grounding with Multi-modal Large Language Model [42.04042642085121]
Medical Report Grounding (MedRG)は、キーフレーズを予測するためにマルチモーダルな大規模言語モデルを利用するエンドツーエンドのソリューションである。 MedRGの有効性を実証し,既存の医療用語の接頭法の性能を上回り,その効果を検証した。
論文参考訳（メタデータ） (2024-04-10T07:41:35Z)
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [1.6567372257085946]
胸部X線に対する視覚言語による事前訓練は、主にペアのX線写真とラジオグラフィーレポートを活用することで大きな進歩を遂げた。オープンウェブからの無線画像記述を利用するトランスフォーマーベースのDeViDeを提案する。 DeViDeは知識強化された視覚言語アライメントの3つの重要な特徴を取り入れている。ゼロショット設定では、DeViDeは外部データセットの完全な教師付きモデルと互換性があり、3つの大規模データセットの最先端結果を達成する。
論文参考訳（メタデータ） (2024-04-04T17:40:06Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
ラベル付き医用画像-レポートペアの不足は、ディープニューラルネットワークや大規模ニューラルネットワークの開発において大きな課題となっている。本稿では,コンピュータビジョンと自然言語処理の基盤モデル (FM) として,市販の汎用大規模事前学習モデルのカスタマイズを提案する。
論文参考訳（メタデータ） (2023-06-09T03:02:36Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
本稿では,視覚および言語入力を共通空間に埋め込んだ医用視覚言語モデル(VLM)について検討する。本稿では,新しい画像領域やテキスト領域への汎用事前学習モデルの適用など,低データ性能向上のためのいくつかの候補手法について検討する。テキスト・ツー・イメージ検索をベンチマークとして,2つの胸部X線および放射線学的報告を用いた可変サイズのトレーニングデータセットを用いて,これらの手法の性能評価を行った。
論文参考訳（メタデータ） (2023-03-30T18:20:00Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
我々は、Show-Attend-Tell と GPT-3 という2つの言語モデルを組み合わせて、包括的で記述的な放射線学記録を生成する。提案モデルは、Open-I、MIMIC-CXR、汎用MS-COCOの2つの医療データセットで検証される。
論文参考訳（メタデータ） (2022-09-28T10:27:10Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
眼科報告生成(ORG)のためのクロスモーダルな臨床グラフ変換器(CGT)を提案する。 CGTは、デコード手順を駆動する事前知識として、臨床関係を視覚特徴に注入する。大規模FFA-IRベンチマークの実験は、提案したCGTが従来のベンチマーク手法より優れていることを示した。
論文参考訳（メタデータ） (2022-06-04T13:16:30Z)
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment [27.111857943935725]
胸部X線からのレポート生成のための自動マルチモーダルアプローチを提案する。本手法は,学習知識ベースとマルチモーダルアライメントの2つの異なるモジュールを特徴とする。両モジュールの助けを借りて、我々のアプローチは明らかに最先端の手法よりも優れている。
論文参考訳（メタデータ） (2021-12-30T10:43:56Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
放射線技師の動作パターンを模倣する補助信号誘導知識デコーダ(ASGK)を提案する。 ASGKは、内的特徴融合と外部医療言語情報を統合して、医療知識の伝達と学習をガイドする。
論文参考訳（メタデータ） (2020-06-06T01:00:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。