Fugu-MT 論文翻訳(概要): Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation

論文の概要: Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation

arxiv url: http://arxiv.org/abs/2508.17466v1
Date: Sun, 24 Aug 2025 17:47:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.539912
Title: Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
Title（参考訳）: 足ロボットにおけるグラスピングの最適化:ロコマニピュレーションの深層学習アプローチ
Authors: Dilermando Almeida, Guilherme Lazzarini, Juliano Negri, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker,
Abstract要約: 本稿では,腕を備えた四足歩行の把握能力を高めるために,深層学習フレームワークを提案する。我々はジェネシスシミュレーション環境内にパイプラインを構築し、共通物体の把握の試みの合成データセットを生成する。このデータセットは、オンボードのRGBとディープカメラからのマルチモーダル入力を処理するU-Netのようなアーキテクチャで、カスタムCNNのトレーニングに使用された。
参考スコア（独自算出の注目度）: 0.6533458718563319
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quadruped robots have emerged as highly efficient and versatile platforms, excelling in navigating complex and unstructured terrains where traditional wheeled robots might fail. Equipping these robots with manipulator arms unlocks the advanced capability of loco-manipulation to perform complex physical interaction tasks in areas ranging from industrial automation to search-and-rescue missions. However, achieving precise and adaptable grasping in such dynamic scenarios remains a significant challenge, often hindered by the need for extensive real-world calibration and pre-programmed grasp configurations. This paper introduces a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, focusing on improved precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
Abstract（参考訳）: 四輪ロボットは、より効率的で多用途なプラットフォームとして登場し、従来の車輪付きロボットが失敗する複雑な非構造地形をナビゲートすることに長けている。これらのロボットにマニピュレータアームを装着することで、産業の自動化から救助活動まで、複雑な物理的相互作用を行うロボマニピュレーションの高度な能力を解き放つ。しかし、このような動的シナリオにおいて正確かつ適応的な把握を実現することは大きな課題であり、多くの場合、広範囲の現実世界のキャリブレーションと事前にプログラムされた把握構成の必要性によって妨げられる。本稿では,腕を装着した四足歩行の把握能力を向上し,精度と適応性の向上に焦点をあてた深層学習フレームワークを提案する。我々のアプローチは、物理データ収集への依存を最小限に抑えるシム・ツー・リアルな方法論に重点を置いている。我々は,ジェネシスシミュレーション環境内にパイプラインを構築し,共通物体の把握の試みを合成したデータセットを作成した。様々な視点から何千もの相互作用をシミュレートすることで、我々はモデルの基礎的真理として機能する、ピクセルワイズアノテートなグリップクオリティマップを作成しました。このデータセットは、オンボードのRGBと深度カメラからのマルチモーダル入力を処理するU-NetのようなアーキテクチャでカスタムCNNをトレーニングするために使用された。訓練されたモデルは、最適な把握点を特定するために、グリップ品質のヒートマップを出力する。四脚ロボットの完全な枠組みを検証した。システムは,対象物に自律的にナビゲートし,センサーで認識し,モデルを用いて最適な把握ポーズを予測し,正確な把握を行うという,完全なロコ操作タスクを成功裏に実行した。この研究は、高度なセンシングでシミュレートされたトレーニングを活用することで、オブジェクトハンドリングにスケーラブルで効果的なソリューションが提供されることを証明している。

論文の概要: Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation

関連論文リスト