Fugu-MT 論文翻訳(概要): RoboEye: Enhancing 2D Robotic Object Identification with Selective 3D Geometric Keypoint Matching

論文の概要: RoboEye: Enhancing 2D Robotic Object Identification with Selective 3D Geometric Keypoint Matching

arxiv url: http://arxiv.org/abs/2509.14966v1
Date: Thu, 18 Sep 2025 13:59:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.25153
Title: RoboEye: Enhancing 2D Robotic Object Identification with Selective 3D Geometric Keypoint Matching
Title（参考訳）: RoboEye: 選択的な3次元幾何学的キーポイントマッチングによる2次元ロボット物体識別の実現
Authors: Xingwu Zhang, Guanxuan Li, Zhuocheng Zhang, Zijun Long,
Abstract要約: RoboEyeはドメイン適応型3D推論と軽量アダプタで2Dセマンティック機能を追加するフレームワークである。実験の結果、RoboEyeはRecall@1を7.1%改善した。
参考スコア（独自算出の注目度）: 5.240139281459202
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapidly growing number of product categories in large-scale e-commerce makes accurate object identification for automated packing in warehouses substantially more difficult. As the catalog grows, intra-class variability and a long tail of rare or visually similar items increase, and when combined with diverse packaging, cluttered containers, frequent occlusion, and large viewpoint changes-these factors amplify discrepancies between query and reference images, causing sharp performance drops for methods that rely solely on 2D appearance features. Thus, we propose RoboEye, a two-stage identification framework that dynamically augments 2D semantic features with domain-adapted 3D reasoning and lightweight adapters to bridge training deployment gaps. In the first stage, we train a large vision model to extract 2D features for generating candidate rankings. A lightweight 3D-feature-awareness module then estimates 3D feature quality and predicts whether 3D re-ranking is necessary, preventing performance degradation and avoiding unnecessary computation. When invoked, the second stage uses our robot 3D retrieval transformer, comprising a 3D feature extractor that produces geometry-aware dense features and a keypoint-based matcher that computes keypoint-correspondence confidences between query and reference images instead of conventional cosine-similarity scoring. Experiments show that RoboEye improves Recall@1 by 7.1% over the prior state of the art (RoboLLM). Moreover, RoboEye operates using only RGB images, avoiding reliance on explicit 3D inputs and reducing deployment costs. The code used in this paper is publicly available at: https://github.com/longkukuhi/RoboEye.
Abstract（参考訳）: 大規模eコマースにおける製品カテゴリーの急速な増加は、倉庫の自動梱包のための正確なオブジェクト識別を著しく困難にしている。カタログが大きくなるにつれて、クラス内の変動性と、希少または視覚的に類似したアイテムの長い尾が増加し、多様なパッケージング、ばらばらなコンテナ、頻繁な閉塞、大きな視点の変化と組み合わせることで、クエリと参照画像の相違が増幅され、2次元の外観特徴のみに依存するメソッドのパフォーマンスが急落する。そこで本研究では,ドメイン適応型3D推論と軽量アダプタによる2Dセマンティック機能を動的に拡張し,デプロイメントギャップを埋める2段階識別フレームワークRoboEyeを提案する。最初の段階では、候補ランキングを生成するための2次元特徴を抽出するために、大きな視覚モデルを訓練する。軽量な3D機能認識モジュールは、3Dの特徴品質を推定し、3Dの再ランク付けが必要かどうかを予測し、性能劣化を防止し、不要な計算を避ける。また,2段目では,従来のコサイン類似性スコアではなく,クエリと参照画像のキーポイント対応信頼度を計算するキーポイントベースの整合器と,幾何学的特徴量を生成する3次元特徴抽出器を,ロボット3D検索変換器として使用した。実験によると、RoboEyeは以前の最先端(RoboLLM)よりもRecall@1を7.1%改善している。さらに、RoboEyeはRGBイメージのみを使用して、明示的な3D入力への依存を避け、デプロイメントコストを削減している。この論文で使用されたコードは、https://github.com/longkukuhi/RoboEye.comで公開されている。

論文の概要: RoboEye: Enhancing 2D Robotic Object Identification with Selective 3D Geometric Keypoint Matching

関連論文リスト