Fugu-MT 論文翻訳(概要): 3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

論文の概要: 3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

arxiv url: http://arxiv.org/abs/2606.23964v1
Date: Mon, 22 Jun 2026 21:45:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.695335
Title: 3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy
Title（参考訳）: 3次元マスケオートエンコーダは、顕微鏡用ボリュームおよびマルチモーダルセル表現のロバスト学習者である
Authors: Amirhossein Kardoost, Lion Gleiter, Tingying Peng, Carsten Marr,
Abstract要約: 蛍光顕微鏡における自己教師付き学習は、しばしば2次元投影に依存している。 MAE-3Dは、下流シングルセルタスクにおいて、2次元最大投影とスライスに基づく変形を一貫して上回ることを示す。
参考スコア（独自算出の注目度）: 3.2257138792902125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised learning in fluorescence microscopy often relies on 2D projections, despite the inherently three-dimensional nature of cells. We present a systematic comparison of 2D and 3D masked autoencoders (MAE-2D vs. MAE-3D) on volumetric microscopy data. Under matched architectures and training protocols, MAE-3D consistently outperforms 2D max-projection and slice-based variants on downstream single-cell tasks. We further align visual representations with a pretrained protein language model (ESM2) and show that cross-modal supervision yields larger gains for volumetric models. Channel cross-attention and frequency-domain regularization are critical for leveraging 3D spatial context. On a protein--protein interaction task, MAE-3D achieves a ROC--AUC of 0.865, outperforming prior methods by up to +0.025. For protein localization, our best 3D model attains state-of-the-art AUC$_{\text{micro}}$ (0.952) and F1$_{\text{micro}}$ (0.742), improving over previous approaches by +0.003 and +0.010 absolute, respectively. Overall, these results demonstrate the advantages of native 3D modeling and multimodal alignment for representation learning in single-cell microscopy.
Abstract（参考訳）: 蛍光顕微鏡における自己教師付き学習は、細胞の本質的な3次元の性質にもかかわらず、しばしば2次元投影に依存している。本研究では,2次元および3次元マスク付きオートエンコーダ(MAE-2D vs. MAE-3D)の体積顕微鏡データに対する系統的比較を行った。一致したアーキテクチャとトレーニングプロトコルの下では、MAE-3Dは下流のシングルセルタスクにおいて、2Dの最大射影とスライスベースの変形よりも一貫して優れていた。さらに、視覚表現を事前訓練されたタンパク質言語モデル(ESM2)と整合させ、クロスモーダル・インスペクションがボリュームモデルに大きな利益をもたらすことを示す。チャネルのクロスアテンションと周波数領域の正規化は3次元空間コンテキストの活用に不可欠である。タンパク質-タンパク質相互作用のタスクでは、MAE-3Dは0.865のROC-AUCを達成する。タンパク質の局在化には、最先端の AUC$_{\text{micro}}$ (0.952) と F1$_{\text{micro}}$ (0.742) が得られ、それぞれ+0.003 と +0.010 のアプローチよりも改善されている。これらの結果は、単一セル顕微鏡における表現学習におけるネイティブ3次元モデリングとマルチモーダルアライメントの利点を示している。

論文の概要: 3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

関連論文リスト