Fugu-MT 論文翻訳(概要): Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap

論文の概要: Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap

arxiv url: http://arxiv.org/abs/2103.02554v1
Date: Wed, 3 Mar 2021 17:48:26 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-04 14:47:47.885418
Title: Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap
Title（参考訳）: 潜時空間地図を用いた物体操作のための視覚行動計画法
Authors: Martina Lippi, Petra Poklukar, Michael C. Welle, Anastasiia Varava, Hang Yin, Alessandro Marino, Danica Kragic
Abstract要約: 高次元状態空間を有する複雑な操作タスクの視覚的行動計画のための枠組みを提案する。低次元潜時空間におけるシステムダイナミクスを世界規模で捉えたグラフベースの構造であるタスク計画のためのLatent Space Roadmap(LSR)を提案する。実ロボットで実行された2つの模擬ボックス積み重ねタスクと折り畳みタスクについて,本フレームワークの徹底的な検討を行う。
参考スコア（独自算出の注目度）: 72.01609575400498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space. Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations, given in the form of images, into a structured latent space extracting the respective states, that generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
Abstract（参考訳）: 本稿では,変形可能な物体の操作に焦点をあてた高次元状態空間を用いた複雑な操作タスクの視覚的行動計画の枠組みを提案する。低次元潜時空間におけるシステムダイナミクスを世界規模で捉えたグラフベースの構造であるタスク計画のためのLatent Space Roadmap(LSR)を提案する。 Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations, given in the form of images, into a structured latent space extracting the respective states, that generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. 実ロボットで実行された2つの模擬ボックス積み重ねタスクと折り畳みタスクについて,本フレームワークの徹底的な検討を行う。

関連論文リスト

POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction [53.19968902152528]
POMATOは時間運動と一致する点マップを結合して動的3次元再構成を実現するための統合フレームワークである。具体的には,RGB画素を動的および静的の両方の領域から3次元ポイントマップにマッピングすることで,明示的なマッチング関係を学習する。本稿では,複数の下流タスクにまたがる顕著な性能を示すことによって,提案したポイントマップマッチングと時間融合のパラダイムの有効性を示す。
論文参考訳（メタデータ） (2025-04-08T05:33:13Z)
LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes [2.822816116516042]
大規模セマンティックマッピングは、屋外の自律エージェントが計画やナビゲーションといった高度なタスクを遂行するために不可欠である。本稿では,提案するLiDAR測度のみでの暗黙的表現による大規模3次元意味再構築手法を提案する。
論文参考訳（メタデータ） (2023-11-04T03:55:38Z)
Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
本稿では,言語,視覚,行動データを個別に訓練し,長期的課題を解決するための基礎モデルを提案する。我々は,大規模なビデオ拡散モデルを用いて,環境に根ざした記号的計画を構築するために,大規模言語モデルを用いている。生成したビデオプランは、生成したビデオからアクションを推論する逆ダイナミクスモデルを通じて、視覚運動制御に基礎を置いている。
論文参考訳（メタデータ） (2023-09-15T17:44:05Z)
PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation [10.982464344805194]
PlaneRecTR++はTransformerベースのアーキテクチャで、マルチビューの再構築とポーズ推定に関連するすべてのサブタスクを統合する。提案した統合学習は,ScanNetv1,ScanNetv2,NYUv2-Plane,MatterPort3Dデータセット上での最先端のパフォーマンスを実現する。
論文参考訳（メタデータ） (2023-07-25T18:28:19Z)
Embodied Task Planning with Large Language Models [86.63533340293361]
本研究では,現場制約を考慮した地上計画のための具体的タスクにおけるTAsk Planing Agent (TaPA)を提案する。推論の際には,オープンボキャブラリオブジェクト検出器を様々な場所で収集された多視点RGB画像に拡張することにより,シーン内の物体を検出する。実験の結果,我々のTaPAフレームワークから生成されたプランは,LLaVAやGPT-3.5よりも大きなマージンで高い成功率が得られることがわかった。
論文参考訳（メタデータ） (2023-07-04T17:58:25Z)
Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances [26.082034134908785]
操作可能なオブジェクトの集合に関する事前知識がなくても,タスク・アンド・モーション・プランナが知的行動の計画に利用できることを示す。この戦略により、単一のシステムが様々な実世界のマルチステップ操作タスクを実行できることを実証する。
論文参考訳（メタデータ） (2021-08-09T16:13:47Z)
Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments [81.38641691636847]
エンボディエージェントの観点から,シーン再構築の問題を再考する。 rgb-dデータストリームを用いてインタラクティブシーンを再構築する。この再構成されたシーンは、密集したパノプティカルマップのオブジェクトメッシュを、部分ベースのCADモデルに置き換える。
論文参考訳（メタデータ） (2021-03-30T05:56:58Z)
Plan2Vec: Unsupervised Representation Learning by Latent Plans [106.37274654231659]
Plan2vecは、強化学習にインスパイアされた教師なしの表現学習手法である。 Plan2vecは、近距離を用いて画像データセット上に重み付きグラフを構築し、その局所距離を、計画された経路上の経路積分を蒸留することによって、大域的な埋め込みに外挿する。 1つのシミュレーションと2つの実世界の画像データセットに対する Plan2vec の有効性を実証する。
論文参考訳（メタデータ） (2020-05-07T17:52:23Z)
Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation [74.88956115580388]
プランニングは、イメージを埋め込んだ低次元の潜在状態空間で行われる。我々のフレームワークは2つの主要なコンポーネントで構成されており、画像のシーケンスとして視覚的な計画を生成するビジュアル・フォレスト・モジュール(VFM)と、それら間のアクションを予測するアクション・プロポーザル・ネットワーク(APN)である。
論文参考訳（メタデータ） (2020-03-19T18:43:26Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。