Fugu-MT 論文翻訳(概要): Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration

論文の概要: Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration

arxiv url: http://arxiv.org/abs/2509.15882v1
Date: Fri, 19 Sep 2025 11:29:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-22 18:18:11.140439
Title: Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration
Title（参考訳）: イメージ・ツー・ポイント・クラウド・レジストレーションのための自己監督型クロスモーダル学習
Authors: Xingmei Wang, Xiaoyu Hu, Chengkai Huang, Ziyan Zeng, Guohao Nie, Quan Z. Sheng, Lina Yao,
Abstract要約: CrossI2Pは、クロスモーダル学習と2段階の登録を単一のエンドツーエンドパイプラインで統合する、自己教師型フレームワークである。我々は、CrossI2Pが、KITTI Odometryベンチマークで23.7%、nuScenesで37.9%、最先端の手法より優れていることを示す。
参考スコア（独自算出の注目度）: 22.360139236823155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bridging 2D and 3D sensor modalities is critical for robust perception in autonomous systems. However, image-to-point cloud (I2P) registration remains challenging due to the semantic-geometric gap between texture-rich but depth-ambiguous images and sparse yet metrically precise point clouds, as well as the tendency of existing methods to converge to local optima. To overcome these limitations, we introduce CrossI2P, a self-supervised framework that unifies cross-modal learning and two-stage registration in a single end-to-end pipeline. First, we learn a geometric-semantic fused embedding space via dual-path contrastive learning, enabling annotation-free, bidirectional alignment of 2D textures and 3D structures. Second, we adopt a coarse-to-fine registration paradigm: a global stage establishes superpoint-superpixel correspondences through joint intra-modal context and cross-modal interaction modeling, followed by a geometry-constrained point-level refinement for precise registration. Third, we employ a dynamic training mechanism with gradient normalization to balance losses for feature alignment, correspondence refinement, and pose estimation. Extensive experiments demonstrate that CrossI2P outperforms state-of-the-art methods by 23.7% on the KITTI Odometry benchmark and by 37.9% on nuScenes, significantly improving both accuracy and robustness.
Abstract（参考訳）: 2Dおよび3Dセンサーのブリッジ化は、自律システムにおける堅牢な認識に不可欠である。しかし, テクスチャリッチだが深度あいまいな画像と, わずかながら正確な点群と, 局所最適に収束する既存手法の傾向とのセマンティック・ジオメトリ・ギャップにより, 画像間クラウド (I2P) の登録は依然として困難である。これらの制限を克服するために、クロスモーダル学習と2段階の登録を単一のエンドツーエンドパイプラインで統合する、自己教師型フレームワークであるCrossI2Pを紹介した。まず,2次元テクスチャと3次元構造のアノテーションのない双方向アライメントを実現するために,二経路コントラスト学習を用いて幾何学的意味融合空間を学習する。第2に、大域的な段階は、共同モーダル・コンテキストと相互モーダル・インタラクション・モデリングを通じてスーパーポイント・スーパーピクセル対応を確立し、続いて幾何学的に制約されたポイントレベルの精密な精密な登録を行う。第3に、機能アライメント、対応改善、ポーズ推定の損失のバランスをとるために、勾配正規化を用いた動的トレーニング機構を用いる。大規模な実験では、CrossI2PはKITTI Odometryベンチマークで23.7%、nuScenesで37.9%、最先端の手法では23.7%、精度と堅牢性の両方で大幅に向上している。

論文の概要: Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration

関連論文リスト