Fugu-MT 論文翻訳(概要): URoPE: Universal Relative Position Embedding across Geometric Spaces

論文の概要: URoPE: Universal Relative Position Embedding across Geometric Spaces

arxiv url: http://arxiv.org/abs/2604.18747v1
Date: Mon, 20 Apr 2026 18:52:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.435268
Title: URoPE: Universal Relative Position Embedding across Geometric Spaces
Title（参考訳）: URoPE: 幾何学的空間に埋め込まれた普遍的相対的位置
Authors: Yichen Xie, Depu Meng, Chensheng Peng, Yihan Hu, Quentin Herau, Masayoshi Tomizuka, Wei Zhan,
Abstract要約: URoPEは回転位置埋め込み(Rotary Position Embedding, RoPE)の普遍的な拡張である。キー/値の画像パッチごとに、URoPEは事前に定義された奥行きアンカーで対応するカメラ線に沿って3Dポイントをサンプリングする。標準2D RoPEは、投影されたピクセル座標を用いて適用することができる。
参考スコア（独自算出の注目度）: 55.651792747815854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Relative position embedding has become a standard mechanism for encoding positional information in Transformers. However, existing formulations are typically limited to a fixed geometric space, namely 1D sequences or regular 2D/3D grids, which restricts their applicability to many computer vision tasks that require geometric reasoning across camera views or between 2D and 3D spaces. To address this limitation, we propose URoPE, a universal extension of Rotary Position Embedding (RoPE) to cross-view or cross-dimensional geometric spaces. For each key/value image patch, URoPE samples 3D points along the corresponding camera ray at predefined depth anchors and projects them into the query image plane. Standard 2D RoPE can then be applied using the projected pixel coordinates. URoPE is a parameter-free and intrinsics-aware relative position embedding that is invariant to the choice of global coordinate systems, while remaining fully compatible with existing RoPE-optimized attention kernels. We evaluate URoPE as a plug-in positional encoding for transformer architectures across a diverse set of tasks, including novel view synthesis, 3D object detection, object tracking, and depth estimation, covering 2D-2D, 2D-3D, and temporal scenarios. Experiments show that URoPE consistently improves the performance of transformer-based models across all tasks, demonstrating its effectiveness and generality for geometric reasoning. Our project website is: https://urope-pe.github.io/.
Abstract（参考訳）: 相対的な位置埋め込みは、トランスフォーマーにおける位置情報を符号化する標準的なメカニズムとなっている。しかし、既存の定式化は1Dシーケンスや通常の2D/3Dグリッドのような固定された幾何学的空間に限られており、カメラビューや2Dおよび3D空間間の幾何学的推論を必要とする多くのコンピュータビジョンタスクに適用性を制限する。この制限に対処するため,ロータリー位置埋め込み(RoPE)をクロスビューあるいはクロス次元幾何学空間に拡張したURoPEを提案する。キー/値の画像パッチ毎に、URoPEは事前に定義された深さアンカーで対応するカメラ線に沿って3Dポイントをサンプリングし、クエリ画像プレーンに投影する。標準2D RoPEは、投影されたピクセル座標を用いて適用することができる。 URoPEはパラメータフリーで固有の相対的な位置埋め込みであり、グローバル座標系の選択に不変であるが、既存のRoPE最適化アテンションカーネルと完全互換である。 URoPEは、2D-2D、2D-3D、2D-3D、2D-3D、時間的シナリオをカバーし、新しいビュー合成、3Dオブジェクト検出、オブジェクト追跡、深さ推定を含む様々なタスクからなるトランスフォーマーアーキテクチャのプラグイン位置符号化として評価する。実験により、URoPEは全てのタスクにわたってトランスフォーマーモデルの性能を一貫して改善し、幾何学的推論の有効性と一般化を実証した。プロジェクトのWebサイトは: https://urope-pe.github.io/.com/です。

論文の概要: URoPE: Universal Relative Position Embedding across Geometric Spaces

関連論文リスト