Fugu-MT 論文翻訳(概要): Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning

論文の概要: Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning

arxiv url: http://arxiv.org/abs/2510.10068v1
Date: Sat, 11 Oct 2025 07:05:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.763634
Title: Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning
Title（参考訳）: 半教師付きマルチモーダルマルチタスク学習のためのマルチランダムマスクオートエンコーダを用いた確率的ハイパーグラフ
Authors: Pîrvu Mihai-Cristian, Leordeanu Marius,
Abstract要約: マスク付きオートエンコーダ(PHG-MAE)を用いた確率的ハイパーグラフの導入 PHG-MAEはニューラルグラフに関する古典的な研究を統合する新しいモデルである。アンサンブルの上に知識蒸留を施すことができ、性能が損なわれないことを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The computer vision domain has greatly benefited from an abundance of data across many modalities to improve on various visual tasks. Recently, there has been a lot of focus on self-supervised pre-training methods through Masked Autoencoders (MAE) \cite{he2022masked,bachmann2022multimae}, usually used as a first step before optimizing for a downstream task, such as classification or regression. This is very useful as it doesn't require any manually labeled data. In this work, we introduce Probabilistic Hyper-Graphs using Masked Autoencoders (PHG-MAE): a novel model that unifies the classical work on neural graphs \cite{leordeanu2021semi} with the modern approach of masked autoencoders under a common theoretical framework. Through random masking of entire modalities, not just patches, the model samples from the distribution of hyper-edges on each forward pass. Additionally, the model adapts the standard MAE algorithm by combining pre-training and fine-tuning into a single training loop. Moreover, our approach enables the creation of inference-time ensembles which, through aggregation, boost the final prediction performance and consistency. Lastly, we show that we can apply knowledge distillation on top of the ensembles with little loss in performance, even with models that have fewer than 1M parameters. While our work mostly focuses on outdoor UAV scenes that contain multiple world interpretations and modalities, the same steps can be followed in other similar domains, such as autonomous driving or indoor robotics. In order to streamline the process of integrating external pre-trained experts for computer vision multi-modal multi-task learning (MTL) scenarios, we developed a data-pipeline software. Using this tool, we have created and released a fully-automated extension of the Dronescapes dataset. All the technical details, code and reproduction steps are publicly released.
Abstract（参考訳）: コンピュータビジョン領域は、様々な視覚的タスクを改善するために、多くのモダリティにまたがる大量のデータから大きな恩恵を受けている。近年,Masked Autoencoders (MAE) \cite{he2022masked,bachmann2022multimae} による自己教師型事前学習手法が注目されている。これは手動でラベル付けされたデータを必要としないので非常に便利です。本研究では,Masked Autoencoders (PHG-MAE) を用いた確率的ハイパーグラフ(probabilistic Hyper-Graphs)を導入する。パッチだけでなく、すべてのモダリティをランダムにマスキングすることで、各前方パス上のハイパーエッジの分布からモデルサンプルをサンプリングする。さらに、事前学習と微調整を1つのトレーニングループに組み合わせることで、標準のMAEアルゴリズムを適用する。さらに,本手法は,アグリゲーションを通じて最終的な予測性能と一貫性を高める推論時アンサンブルの作成を可能にする。最後に,100万パラメータ未満のモデルであっても,知識蒸留を性能を損なうことなく,アンサンブルの上に適用可能であることを示す。私たちの研究は主に、複数の世界解釈とモダリティを含む屋外UAVシーンに焦点を当てていますが、同様のステップは、自律運転や屋内ロボティクスなど、他の類似の領域でも適用できます。コンピュータビジョンのマルチタスク学習(MTL)シナリオにおいて,外部訓練済みの専門家を統合するプロセスの合理化を目的として,データパイプラインソフトウェアを開発した。このツールを使用して、Dronescapesデータセットの完全自動拡張を作成し、リリースしました。技術的な詳細、コード、再現手順はすべて公開されています。

論文の概要: Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning

関連論文リスト