Fugu-MT 論文翻訳(概要): Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

論文の概要: Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

arxiv url: http://arxiv.org/abs/2209.07118v1
Date: Thu, 15 Sep 2022 08:00:01 GMT
ステータス: 翻訳完了
システム内更新日: 2022-09-16 11:59:55.919501
Title: Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
Title（参考訳）: 整列、理性、学習: 知識による医学的ビジョンと言語事前学習の強化
Authors: Zhihong Chen, Guanbin Li, Xiang Wan
Abstract要約: 本稿では,3つの視点から構造化された医療知識を高めるための体系的かつ効果的なアプローチを提案する。まず、視覚エンコーダと言語エンコーダの表現を知識を通して整列する。次に,多モード融合モデルに知識を注入し,入力画像とテキストの補足として知識を用いた推論を可能にする。第3に、知識によって引き起こされるプレテキストタスクを設計することで、画像やテキストの最も重要な情報に重点を置くよう、モデルを指導する。
参考スコア（独自算出の注目度）: 68.90835997085557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical vision-and-language pre-training (Med-VLP) has received considerable attention owing to its applicability to extracting generic vision-and-language representations from medical images and texts. Most existing methods mainly contain three elements: uni-modal encoders (i.e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP. Although there exist knowledge-enhanced vision-and-language pre-training (VLP) methods in the general domain, most require off-the-shelf toolkits (e.g., object detectors and scene graph parsers), which are unavailable in the medical domain. In this paper, we propose a systematic and effective approach to enhance Med-VLP by structured medical knowledge from three perspectives. First, considering knowledge can be regarded as the intermediate medium between vision and language, we align the representations of the vision encoder and the language encoder through knowledge. Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text. Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks. To perform a comprehensive evaluation and facilitate further research, we construct a medical vision-and-language benchmark including three tasks. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on all downstream tasks. Further analyses explore the effects of different components of our approach and various settings of pre-training.
Abstract（参考訳）: medical vision-and-language pre-training (med-vlp) は、医学画像やテキストから汎用的な視覚言語表現を抽出することができるため、多くの注目を集めている。既存の手法の多くは、ユニモーダルエンコーダ(視覚エンコーダと言語エンコーダ)、マルチモーダル融合モジュール、プリテキストタスクの3つの要素を含み、医療領域の専門家の知識の重要性を考慮し、そのような知識を明示的に活用してmed-vlpを促進する研究はほとんどない。一般領域には知識に富んだビジョン・アンド・ランゲージ事前学習(VLP)法があるが、ほとんどの場合、医学領域では利用できない既製のツールキット(オブジェクト検出器やシーングラフ解析器など)を必要とする。本稿では,3つの視点から,構造化医療知識によるMed-VLP向上のための体系的,効果的なアプローチを提案する。まず,知識を視覚と言語の間の中間媒体とみなすことで,視覚エンコーダと言語エンコーダの表現を知識を通して整合させる。次に,多モード融合モデルに知識を注入し,入力画像とテキストの補足として知識を用いた推論を可能にする。第3に,知識によって引き起こされる前文タスクを設計することにより,画像やテキストの最も重要な情報に重きを置くようにモデルを指導する。包括的評価を行い,さらなる研究を促進するために,3つのタスクを含む医用視覚・言語ベンチマークを構築した。実験結果から,全ての下流タスクにおける最先端性能を実現する手法の有効性が示された。さらに、我々のアプローチの異なるコンポーネントの効果と事前学習の様々な設定について検討する。

関連論文リスト

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
医療知識の豊富なマルチモーダルデータセットを構築した。次に医学専門のMLLMであるLingshuを紹介します。 Lingshuは、医療専門知識の組み込みとタスク解決能力の向上のために、マルチステージトレーニングを行っている。
論文参考訳（メタデータ） (2025-06-08T08:47:30Z)
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations [13.991376926757036]
医療データに適した統合型ビジョンランゲージ事前学習フレームワークであるMedUnifierを提案する。 MedUnifierはテキスト基底画像生成機能とマルチモーダル学習戦略をシームレスに統合する。本手法では, 視覚ベクトル量子化を用いて, クロスモーダル理解のためのより密着的な学習戦略を実現するとともに, マルチモーダル生成品質を向上させる。
論文参考訳（メタデータ） (2025-03-02T21:09:32Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
医療ビジョン・アンド・ランゲージモデル(MVLM)は、複雑な医療データを解釈するための自然言語インタフェースを提供する能力から、大きな関心を集めている。本稿では,MVLMの概要と適用した各種医療課題について概観する。また、これらのタスクに使用するデータセットについても検討し、標準化された評価指標に基づいて異なるモデルの性能を比較した。
論文参考訳（メタデータ） (2024-11-19T03:27:05Z)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning [48.97640824497327]
本稿では、画像テキストのコントラスト学習を通じて、言語情報を視覚領域に統合するための案内信号として、ドメイン固有の医療知識を活用する新しいフレームワークを提案する。我々のモデルには、設計した分散エンコーダによるグローバルコントラスト学習、局所トークン・知識・パッチアライメントコントラスト学習、知識誘導型カテゴリレベルのコントラスト学習、エキスパートナレッジによるコントラスト学習が含まれる。特に、MLIPは、限られた注釈付きデータであっても最先端の手法を超越し、医療表現学習の進歩におけるマルチモーダル事前学習の可能性を強調している。
論文参考訳（メタデータ） (2024-02-03T05:48:50Z)
Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training [6.582001681307021]
知識ブースティング・コントラスト・ビジョン・ランゲージ事前学習フレームワーク(KoBo)を提案する。 KoBoは、臨床知識を視覚言語意味一貫性の学習に統合する。分類,セグメンテーション,検索,意味的関連性を含む8つのタスクに対するフレームワークの効果を検証する実験を行った。
論文参考訳（メタデータ） (2023-07-14T09:38:22Z)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [85.19963303642427]
本稿では,バイオメディカルイメージのオープンな研究課題に答えられる視覚言語対話アシスタントを訓練するための費用効率のよいアプローチを提案する。モデルはまず、フィギュア・キャプションのペアを使ってバイオメディカル・ボキャブラリをアライメントし、その後、オープンエンドの会話意味論を習得する。これにより、バイオメディジンのための大規模言語と視覚アシスタントを15時間以内で(8つのA100で)訓練することができる。
論文参考訳（メタデータ） (2023-06-01T16:50:07Z)
Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [8.547751745702156]
我々は、事前学習された視覚言語モデル(VLM)から知識を引き出すための、よく設計された医療プロンプトが鍵であることを示している。医用プロンプトの自動生成のための3つのアプローチを開発し,専門家レベルの医療知識と画像特異的情報を微粒な接地プロンプトに注入する。
論文参考訳（メタデータ） (2022-09-30T15:06:13Z)
Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training [62.215025958347105]
マルチモーダルマスク付きオートエンコーダを用いた自己教師型学習パラダイムを提案する。我々は、ランダムにマスキングされた画像やテキストから欠落したピクセルやトークンを再構成することで、クロスモーダルなドメイン知識を学習する。
論文参考訳（メタデータ） (2022-09-15T07:26:43Z)
Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer [61.34424171458634]
視覚的知識を言語モデルに組み込むことがギャップを埋めるかどうかを検討する。実験の結果,視覚的知識伝達は低リソース環境と完全教師付き環境の両方で性能を向上できることがわかった。
論文参考訳（メタデータ） (2022-03-14T22:02:40Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。