Fugu-MT 論文翻訳(概要): AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

論文の概要: AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

arxiv url: http://arxiv.org/abs/2203.10095v1
Date: Fri, 18 Mar 2022 13:43:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-27 05:07:28.423842
Title: AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation
Title（参考訳）: Align Transformer:医療報告作成のための視覚領域と疾患タグの階層的アライメント
Authors: Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu
Abstract要約: 本稿では,Align Hierarchical Attention (AHA)とMulti-Grained Transformer (MGT)モジュールを含むAlign Transformerフレームワークを提案する。パブリックなIU-XrayとMIMIC-CXRデータセットの実験は、AlignTransformerが2つのデータセットの最先端メソッドと競合する結果が得られることを示している。
参考スコア（独自算出の注目度）: 50.21065317817769
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach.
Abstract（参考訳）: 近年,医療画像の長い記述文を自動生成することを目的とした医療レポート生成が研究の関心を集めている。一般的な画像キャプションタスクとは異なり、データ駆動ニューラルモデルでは、医療レポート生成がより難しい。これは主に原因である 1)本質的なデータバイアス:正常な視覚領域が異常な視覚領域のデータセットを支配し、 2) 非常に長い配列である。上記の2つの問題を緩和するために、Align Hierarchical Attention (AHA)とMulti-Grained Transformer (MGT)モジュールを含むAlign Transformerフレームワークを提案する。 1)AHAモジュールは、まず入力画像から疾患タグを予測し、次に、視覚領域と疾患タグを階層的に整列させることで、多彩な視覚特徴を学習する。取得した病原性視覚特徴は、入力画像の異常領域をよりよく表現し、データのバイアス問題を緩和することができる。 2)MGTモジュールは多機能化とTransformerフレームワークを効果的に利用し,長い医療報告を生成する。パブリックIU-XrayとMIMIC-CXRデータセットの実験は、AlignTransformerが2つのデータセットの最先端メソッドと競合する結果が得られることを示している。さらに,プロの放射線技師による人的評価は,我々のアプローチの有効性をさらに証明している。

関連論文リスト

TagGAN: A Generative Model for Data Tagging [1.820857020024539]
本稿では,新しいGAN(Generative Adversarial Networks)ベースのフレームワークTagGANを提案する。 TagGANは、純粋に画像レベルのラベル付きデータから弱制御されたきめ細かい病気マップを生成するために調整されている。本手法は, 画素レベルのアノテーションを必要とせず, 毎週, 病原体を可視化する, 詳細な病原体マップを作成することを目的としている。
論文参考訳（メタデータ） (2025-02-25T04:29:18Z)
Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation [15.468023420115431]
MLLMは、検索強化された生成フレームワークであるVisual RAGをサポートするためにどのように拡張されるかを示す。 MIMIC-CXR胸部X線レポート生成とマルチケア医療画像キャプション生成データセットについて,ビジュアルRAGが実体探索の精度を向上させることを示す。
論文参考訳（メタデータ） (2025-02-20T20:55:34Z)
Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer [4.672688418357066]
本稿では,雑音の存在下での頑健なセグメンテーションのためのトランスフォーマー拡散(DTS)モデルを提案する。画像の形態的表現を解析する本モデルでは, 種々の医用画像モダリティにおいて, 従来のモデルよりも良好な結果が得られた。
論文参考訳（メタデータ） (2024-08-01T07:35:54Z)
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
特にGemini-Vision-Series (Gemini) と GPT-4-Series (GPT-4) は、コンピュータビジョンのための人工知能のパラダイムシフトを象徴している。本研究は,14の医用画像データセットを対象に,Gemini,GPT-4,および4つの一般的な大規模モデルの性能評価を行った。
論文参考訳（メタデータ） (2024-07-08T09:08:42Z)
RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CTはCT-RATEに基づく大規模3次元胸部CT解釈データセットである。私たちは、最新の強力なユニバーサルセグメンテーションと大きな言語モデルを活用して、元のデータセットを拡張します。
論文参考訳（メタデータ） (2024-04-25T17:11:37Z)
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation [0.7381551917607596]
TiBiX:双方向X線とレポート生成のための時間情報を活用する。双方向X線とレポート生成のための時間情報を活用するTiBiXを提案する。
論文参考訳（メタデータ） (2024-03-20T07:00:03Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
アイゲイズガイドマルチモーダルアライメント(EGMA)フレームワークは、アイゲイズデータを利用して、医用視覚的特徴とテキスト的特徴のアライメントを改善する。我々は4つの医療データセット上で画像分類と画像テキスト検索の下流タスクを行う。
論文参考訳（メタデータ） (2024-03-19T03:59:14Z)
GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image Generation [0.0]
生成モデルは、医療画像不足問題に対処するための有望な解決策を提供する。本稿では遺伝的アルゴリズムを組み込んだ生成モデルであるGAN-GAを提案する。提案モデルは特徴を保ちながら画像の忠実度と多様性を向上させる。
論文参考訳（メタデータ） (2023-12-30T20:16:45Z)
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [67.97926983664676]
ドメイン転送ネットワーク(C2M-DoT)を用いたクロスモーダルなマルチビュー医療レポート生成を提案する。 C2M-DoTは、すべてのメトリクスで最先端のベースラインを大幅に上回る。
論文参考訳（メタデータ） (2023-10-09T02:31:36Z)
Dynamic Multi-Domain Knowledge Networks for Chest X-ray Report Generation [0.5939858158928474]
放射線診断レポート生成のための動的マルチドメイン知識(DMDK)ネットワークを提案する。 DMDKネットワークは、Chest Feature Extractor(CFE), Dynamic Knowledge Extractor(DKE), Specific Knowledge Extractor(SKE), Multi-knowledge Integrator(MKI)モジュールの4つのモジュールで構成されている。 IU X-RayとMIMIC-CXRの2つの広く使われているデータセットについて広範な実験を行った。
論文参考訳（メタデータ） (2023-10-08T11:20:02Z)
IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer [4.376565880192482]
本稿では,医用レポート生成のためのイメージ・ツー・インジケータ階層変換器(IIHT)フレームワークを提案する。提案したIIHT法は, 実環境における疾患指標の修正が可能である。
論文参考訳（メタデータ） (2023-08-10T15:22:11Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
放射線技師の動作パターンを模倣する補助信号誘導知識デコーダ(ASGK)を提案する。 ASGKは、内的特徴融合と外部医療言語情報を統合して、医療知識の伝達と学習をガイドする。
論文参考訳（メタデータ） (2020-06-06T01:00:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。