Fugu-MT 論文翻訳(概要): FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement

論文の概要: FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement

arxiv url: http://arxiv.org/abs/2512.09373v1
Date: Wed, 10 Dec 2025 07:11:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-11 15:14:53.426998
Title: FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement
Title（参考訳）: FUSER: フィードフォワード MUltiview 3D Registration Transformer と SE(3)$^N$ Diffusion Refinement
Authors: Haobo Jiang, Jin Xie, Jian Yang, Liang Yu, Jianmin Zheng,
Abstract要約: Fは、全てのスキャンを統一されたコンパクトな潜在空間で処理する最初のフィードフォワードマルチビュー登録変換器である。 F はペアワイズ推定なしでグローバルなポーズを予測する。 3DMatch, ScanNet, ArkitScenesの実験により, 本手法が優れた登録精度と優れた計算効率を実現することを示す。
参考スコア（独自算出の注目度）: 39.19949818461193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints. This paper proposes FUSER, the first feed-forward multiview registration transformer that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency. Building upon FUSER, we further introduce FUSER-DF, an SE(3)$^N$ diffusion refinement framework to correct FUSER's estimates via denoising in the joint SE(3)$^N$ space. FUSER acts as a surrogate multiview registration model to construct the denoiser, and a prior-conditioned SE(3)$^N$ variational lower bound is derived for denoising supervision. Extensive experiments on 3DMatch, ScanNet and ArkitScenes demonstrate that our approach achieves the superior registration accuracy and outstanding computational efficiency.
Abstract（参考訳）: 従来、多視点雲の登録は、大域同期のためのポーズグラフを構築するために、広範囲なペアワイズマッチングに頼っていた。本稿ではFUSERを提案する。FUSERはフィードフォワード型マルチビュー登録トランスフォーマーで、全スキャンをコンパクトなラテント空間で共同処理し、一対推定なしでグローバルなポーズを直接予測する。トラクタビリティを維持するために、FUSERは、各スキャンを絶対的な翻訳キューを保持するスパース3D CNNを介して低解像度のスーパーポイント特徴にエンコードし、幾何学的交換アテンションモジュールを通じて効率的なスキャン内およびスキャン間推論を行う。特に,3次元特徴相互作用と幾何整合性を高めるために,既成の基礎モデルから2次元の注意を移す。さらに、FUSERをベースとしたFUSER-DF(SE(3)$^N$拡散補正フレームワーク)を導入し、共同SE(3)$^N$空間における denoising によるFUSERの推定を補正する。 FUSERは代用マルチビュー登録モデルとして機能し、事前条件付きSE(3)$^N$変動下界をデノナイズ管理のために導出する。 3DMatch, ScanNet, ArkitScenesの大規模実験により, 本手法が優れた登録精度と計算効率を実現することを示す。

論文の概要: FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement

関連論文リスト