Fugu-MT 論文翻訳(概要): MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

論文の概要: MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

arxiv url: http://arxiv.org/abs/2507.02488v1
Date: Thu, 03 Jul 2025 09:51:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-04 15:37:16.08009
Title: MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention
Title（参考訳）: MedFormer:コンテンツ対応デュアルスパース選択注意型階層型医用ビジョントランス
Authors: Zunhui Xia, Hongxing Li, Libin Lan,
Abstract要約: MedFormerは、2つの重要なアイデアを持つ効率的な医療ビジョントランスフォーマーである。まず、様々な医用画像認識タスクのための多用途バックボーンとしてピラミッドスケーリング構造を用いる。第2に、コンテンツ認識による計算効率の向上を目的とした、新しいDual Sparse Selection Attention (DSSA)を導入する。
参考スコア（独自算出の注目度）: 1.474723404975345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is explicitly designed to attend to the most relevant content. In addition, a detailed theoretical analysis has been conducted, demonstrating that MedFormer has superior generality and efficiency in comparison to existing medical vision transformers. Extensive experiments on a variety of imaging modality datasets consistently show that MedFormer is highly effective in enhancing performance across all three above-mentioned medical image recognition tasks. The code is available at https://github.com/XiaZunhui/MedFormer.
Abstract（参考訳）: 医用画像認識は、臨床診断を助ける重要な方法であり、疾患や異常のより正確でタイムリーな識別を可能にする。視覚変換器に基づくアプローチは、様々な医学的認識タスクを扱うのに有効であることが証明されている。しかし、これらの手法は2つの大きな課題に直面する。第一に、それらはしばしばタスク特化され、アーキテクチャに合わせており、一般的な適用性を制限する。第二に、彼らは通常、長距離依存のモデルに完全な注意を払うか、高い計算コストをもたらすか、手作りのスパースな注意を頼りにし、潜在的に最適以下のパフォーマンスをもたらす。これらの課題に対処するため、我々は2つの重要なアイデアを持つ効率的な医療ビジョントランスフォーマーであるMedFormerを紹介した。まず、画像分類やセマンティックセグメンテーションや病変検出などの密集した予測タスクなど、さまざまな画像認識タスクのための汎用的なバックボーンとしてピラミッドスケーリング構造を用いる。この構造は、特徴写像の計算負荷を低減しつつ、階層的な特徴表現を容易にし、性能を高めるのに非常に有益である。第二に、コンテンツ認識を伴う新しいデュアルスパース選択注意(Dual Sparse Selection Attention,DSSA)を導入し、高い性能を維持しながら、計算効率とノイズに対する堅牢性を向上させる。 MedFormerのコアビルディングテクニックとして、DSSAは最も関連性の高いコンテンツに対応するように設計されている。さらに、MedFormerは既存の医用視覚変換器と比較して、汎用性と効率性が優れていることを示す、詳細な理論解析が実施されている。様々な画像モダリティデータセットに対する大規模な実験により、MedFormerは上記の3つの医療画像認識タスクのすべてにおいて、パフォーマンスを高めるのに非常に効果的であることが示された。コードはhttps://github.com/XiaZunhui/MedFormer.comで入手できる。

関連論文リスト

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
医用画像セグメンテーションにおける領域一般化に取り組むために, MCDRL(Multimodal Causal-Driven Representation Learning)を提案する。 MCDRLは競合する手法より一貫して優れ、セグメンテーション精度が優れ、堅牢な一般化性を示す。
論文参考訳（メタデータ） (2025-08-07T03:41:41Z)
MedGemma Technical Report [75.88152277443179]
MedGemmaは、Gemma 3 4Bと27Bをベースとした医療ビジョン言語基盤モデルの集合体である。 MedGemmaは、画像とテキストの高度な医学的理解と推論を実証する。また、SigLIPから派生した医用目視エンコーダであるMedSigLIPを紹介する。
論文参考訳（メタデータ） (2025-07-07T17:01:44Z)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
医療知識の豊富なマルチモーダルデータセットを構築した。次に医学専門のMLLMであるLingshuを紹介します。 Lingshuは、医療専門知識の組み込みとタスク解決能力の向上のために、マルチステージトレーニングを行っている。
論文参考訳（メタデータ） (2025-06-08T08:47:30Z)
DualPrompt-MedCap: A Dual-Prompt Enhanced Approach for Medical Image Captioning [5.456249017636404]
我々は、LVLM(Large Vision-Language Models)を拡張した新しいデュアルプロンプト拡張フレームワークであるDualPrompt-MedCapを紹介する。医療用問合せペアを事前訓練した半教師付き分類モデルに基づくモダリティ認識プロンプトと,バイオメディカル言語モデル埋め込みを利用した質問誘導プロンプト。本手法は,医療専門家の事前知識と下流視覚言語タスクの自動アノテーションとして機能する臨床的精度の高いレポートの作成を可能にする。
論文参考訳（メタデータ） (2025-04-13T14:31:55Z)
Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation [21.183229457060634]
10KCTの大規模データセットでHi-End-MAEを事前訓練し、7つの公開医用画像セグメンテーションベンチマークでその性能を評価する。 Hi-End-MAEは、様々な下流タスクにまたがる優れた伝達学習能力を実現し、医用画像の応用におけるViTの可能性を明らかにする。
論文参考訳（メタデータ） (2025-02-12T12:14:02Z)
Efficient MedSAMs: Segment Anything in Medical Images on Laptop [69.28565867103542]
我々は,迅速な医用画像のセグメンテーションに特化した初の国際コンペを組織した。トップチームは軽量なセグメンテーション基盤モデルを開発し、効率的な推論パイプラインを実装した。最高のパフォーマンスのアルゴリズムは、臨床導入を促進するために、ユーザフレンドリーなインターフェースを備えたオープンソースソフトウェアに組み込まれている。
論文参考訳（メタデータ） (2024-12-20T17:33:35Z)
Unified Medical Image Pre-training in Language-Guided Common Semantic Space [39.61770813855078]
我々はUnified Medical Image Pre-Trainingフレームワーク(UniMedI)を提案する。 UniMedIは、診断レポートを一般的な意味空間として使用し、医療画像の多様なモダリティの統一表現を作成する。 10種類のデータセットにまたがる2次元画像と3次元画像の性能評価を行った。
論文参考訳（メタデータ） (2023-11-24T22:01:12Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
ラベル付き医用画像-レポートペアの不足は、ディープニューラルネットワークや大規模ニューラルネットワークの開発において大きな課題となっている。本稿では,コンピュータビジョンと自然言語処理の基盤モデル (FM) として,市販の汎用大規模事前学習モデルのカスタマイズを提案する。
論文参考訳（メタデータ） (2023-06-09T03:02:36Z)
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation [38.61227663176952]
医用画像理解基盤モデルの構築を目的としたパラダイムであるユニバーサル・メディカルイメージ・セグメンテーションへのシフトを提案する。医用画像セグメンテーションにおけるデータの異質性やアノテーションの違いに対処する新しい文脈優先学習手法であるHermesを開発した。
論文参考訳（メタデータ） (2023-06-04T17:39:08Z)
MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer [53.575573940055335]
我々は、MedSegDiff-V2と呼ばれるトランスフォーマーベースの拡散フレームワークを提案する。画像の異なる20種類の画像分割作業において,その有効性を検証する。
論文参考訳（メタデータ） (2023-01-19T03:42:36Z)
Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
医用画像AIの堅牢性とデータ効率を向上させるための統一表現学習戦略であるREMEDISを提案する。様々な医療画像タスクを研究し, 振り返りデータを用いて3つの現実的な応用シナリオをシミュレートする。
論文参考訳（メタデータ） (2022-05-19T17:34:18Z)
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [50.21065317817769]
本稿では,Align Hierarchical Attention (AHA)とMulti-Grained Transformer (MGT)モジュールを含むAlign Transformerフレームワークを提案する。パブリックなIU-XrayとMIMIC-CXRデータセットの実験は、AlignTransformerが2つのデータセットの最先端メソッドと競合する結果が得られることを示している。
論文参考訳（メタデータ） (2022-03-18T13:43:53Z)
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [73.98974074534497]
医用画像分割タスクにおけるトランスフォーマティブネットワークアーキテクチャの利用可能性について検討する。セルフアテンションモジュールに追加の制御機構を導入することで,既存のアーキテクチャを拡張するGated Axial-Attentionモデルを提案する。医療画像上で効果的にモデルを訓練するために,さらにパフォーマンスを向上させる局所的グローバルトレーニング戦略 (logo) を提案する。
論文参考訳（メタデータ） (2021-02-21T18:35:14Z)
Unified Representation Learning for Efficient Medical Image Analysis [0.623075162128532]
統一モダリティ特化特徴表現(UMS-Rep)を用いた医用画像解析のためのマルチタスクトレーニング手法を提案する。提案手法は,計算資源の全体的な需要を減らし,タスクの一般化と性能の向上を図っている。
論文参考訳（メタデータ） (2020-06-19T16:52:16Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。