Fugu-MT 論文翻訳(概要): MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors

論文の概要: MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors

arxiv url: http://arxiv.org/abs/2602.05330v1
Date: Thu, 05 Feb 2026 05:51:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:08.775785
Title: MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors
Title（参考訳）: MTPano: ラベルフリー統合によるマルチタスクパノラマシーン理解
Authors: Jingdong Zhang, Xiaohang Zhan, Lingzhi Zhang, Yizhou Wang, Zhengming Yu, Jionghao Wang, Wenping Wang, Xin Li,
Abstract要約: MTPanoは、ラベルのないトレーニングパイプラインによって確立された堅牢なパノラマ基盤モデルである。我々はパノラマ画像を視点パッチに投影し、正確なドメインギャップのない擬似ラベルを生成する。タスクタイプ間の干渉に対処するため、タスクを回転不変群と回転不変群に分類する。
参考スコア（独自算出の注目度）: 42.124623200906626
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Comprehensive panoramic scene understanding is critical for immersive applications, yet it remains challenging due to the scarcity of high-resolution, multi-task annotations. While perspective foundation models have achieved success through data scaling, directly adapting them to the panoramic domain often fails due to severe geometric distortions and coordinate system discrepancies. Furthermore, the underlying relations between diverse dense prediction tasks in spherical spaces are underexplored. To address these challenges, we propose MTPano, a robust multi-task panoramic foundation model established by a label-free training pipeline. First, to circumvent data scarcity, we leverage powerful perspective dense priors. We project panoramic images into perspective patches to generate accurate, domain-gap-free pseudo-labels using off-the-shelf foundation models, which are then re-projected to serve as patch-wise supervision. Second, to tackle the interference between task types, we categorize tasks into rotation-invariant (e.g., depth, segmentation) and rotation-variant (e.g., surface normals) groups. We introduce the Panoramic Dual BridgeNet, which disentangles these feature streams via geometry-aware modulation layers that inject absolute position and ray direction priors. To handle the distortion from equirectangular projections (ERP), we incorporate ERP token mixers followed by a dual-branch BridgeNet for interactions with gradient truncation, facilitating beneficial cross-task information sharing while blocking conflicting gradients from incompatible task attributes. Additionally, we introduce auxiliary tasks (image gradient, point map, etc.) to fertilize the cross-task learning process. Extensive experiments demonstrate that MTPano achieves state-of-the-art performance on multiple benchmarks and delivers competitive results against task-specific panoramic specialist foundation models.
Abstract（参考訳）: 総合的なパノラマシーン理解は没入型アプリケーションには不可欠であるが、高解像度のマルチタスクアノテーションが不足しているため、依然として困難である。パースペクティブ・ファンデーション・モデルはデータのスケーリングによって成功したが、パノラマ領域への直接適応は、厳密な幾何学的歪みと座標系の違いのために失敗することが多い。さらに、球面空間における多彩な密度予測タスクの基盤となる関係について検討する。これらの課題に対処するために,ラベルフリートレーニングパイプラインによって確立された堅牢なマルチタスクパノラマ基盤モデルであるMTPanoを提案する。まず、データの不足を回避するために、強力な視点の高密度な事前情報を活用します。我々はパノラマ画像を視点パッチに投影し、オフザシェルフ基礎モデルを用いて正確なドメインギャップのない擬似ラベルを生成する。第2に,タスクタイプ間の干渉に対処するために,タスクを回転不変群(例えば,深度,セグメンテーション)と回転不変群(例えば,表面正規化)に分類する。パノラマデュアルブリッジネットは, 絶対位置と光線方向を予め注入する幾何学的変調層を介して, これらの特徴ストリームをアンタングル化する。等角射影(ERP)からの歪みに対処するため,ERPトークンミキサーとデュアルブランチBridgeNetを併用し,非互換なタスク属性からグラデーションの衝突をブロックしながら,最適なクロスタスク情報共有を容易にする。さらに,クロスタスク学習プロセスの肥大化のために補助的なタスク(画像勾配,点マップなど)を導入する。大規模な実験により、MTPanoは複数のベンチマークで最先端のパフォーマンスを達成し、タスク固有のパノラマ専門家基盤モデルと競合する結果をもたらすことが示された。

論文の概要: MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors

関連論文リスト