Fugu-MT 論文翻訳(概要): UniCorrn: Unified Correspondence Transformer Across 2D and 3D

論文の概要: UniCorrn: Unified Correspondence Transformer Across 2D and 3D

arxiv url: http://arxiv.org/abs/2605.04044v1
Date: Tue, 05 May 2026 17:58:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:44.077424
Title: UniCorrn: Unified Correspondence Transformer Across 2D and 3D
Title（参考訳）: UniCorrn:2Dと3Dにまたがる統一対応トランスフォーマー
Authors: Prajnan Goswami, Tianye Ding, Feng Liu, Huaizu Jiang,
Abstract要約: 共有重みを持つ最初の対応モデルUniCorrnを提案する。本稿では、外見と位置の特徴ストリームを分離したデュアルストリームデコーダを提案する。 UniCorrnは2D-2Dマッチングの競争性能を達成し、7Scenes(2D-3D)では8%、登録リコールでは3DLoMatch (3D-3D)では10%向上した。
参考スコア（独自算出の注目度）: 13.22498093287154
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Visual correspondence across image-to-image (2D-2D), image-to-point cloud (2D-3D), and point cloud-to-point cloud (3D-3D) geometric matching forms the foundation for numerous 3D vision tasks. Despite sharing a similar problem structure, current methods use task-specific designs with separate models for each modality combination. We present UniCorrn, the first correspondence model with shared weights that unifies geometric matching across all three tasks. Our key insight is that Transformer attention naturally captures cross-modal feature similarity. We propose a dual-stream decoder that maintains separate appearance and positional feature streams. This design enables end-to-end learning through stack-able layers while supporting flexible query-based correspondence estimation across heterogeneous modalities. Our architecture employs modality-specific backbones followed by shared encoder and decoder components, trained jointly on diverse data combining pseudo point clouds from depth maps with real 3D correspondence annotations. UniCorrn achieves competitive performance on 2D-2D matching and surpasses prior state-of-the-art by 8% on 7Scenes (2D-3D) and 10% on 3DLoMatch (3D-3D) in registration recall. Project website: https://neu-vi.github.io/UniCorrn
Abstract（参考訳）: イメージ・ツー・イメージ(2D-2D)、イメージ・ツー・ポイント・クラウド(2D-3D)、ポイント・クラウド・ツー・ポイント・クラウド(3D-3D)間の視覚対応は、多数の3Dビジョンタスクの基礎となる。同様の問題構造を共有するにもかかわらず、現在の手法では各モードの組み合わせごとに異なるモデルでタスク固有の設計を使用する。共有重みを持つ最初の対応モデルUniCorrnを提案する。キーとなる洞察は、Transformerが自然にクロスモーダルな特徴を捉えていることです。本稿では、外見と位置の特徴ストリームを分離したデュアルストリームデコーダを提案する。この設計により、スタック可能な層を通したエンドツーエンドの学習が可能であり、不均一なモダリティをまたいだ柔軟なクエリベースの対応推定をサポートする。本アーキテクチャでは,奥行きマップからの擬似点雲と実3D対応アノテーションを併用した多種多様なデータに基づいて,共有エンコーダとデコーダコンポーネントを併用したモダリティ固有のバックボーンを用いる。 UniCorrnは2D-2Dマッチングの競争性能を達成し、7Scenes(2D-3D)では8%、登録リコールでは3DLoMatch (3D-3D)では10%向上した。プロジェクトウェブサイト:https://neu-vi.github.io/UniCorrn

論文の概要: UniCorrn: Unified Correspondence Transformer Across 2D and 3D

関連論文リスト