Fugu-MT 論文翻訳(概要): ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

論文の概要: ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

arxiv url: http://arxiv.org/abs/2308.11236v2
Date: Wed, 23 Aug 2023 08:31:16 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-24 11:11:45.936195
Title: ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Title（参考訳）: ROSGPT_Vision:言語モデルのみを用いたロボットの指令
Authors: Bilel Benjdira, Anis Koubaa, Anas M. Ali
Abstract要約: 次世代ロボットはLanguage Modelsのプロンプトのみを用いて操作可能であると論じる。本稿では,このロボットデザインパターンを,Pmpting Robotic Modalities (PRM)という名前で示す。本稿では、ROSGPT_Visionという新しいロボットフレームワークを構築する際に、このPRM設計パターンを適用する。
参考スコア（独自算出の注目度）: 0.9821874476902969
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality). The framework automates all the mechanisms behind these two prompts. The framework enables the robot to address complex real-world scenarios by processing visual data, making informed decisions, and carrying out actions automatically. The framework comprises one generic vision module and two independent ROS nodes. As a test application, we used ROSGPT_Vision to develop CarMate, which monitors the driver's distraction on the roads and makes real-time vocal notifications to the driver. We showed how ROSGPT_Vision significantly reduced the development cost compared to traditional methods. We demonstrated how to improve the quality of the application by optimizing the prompting strategies, without delving into technical details. ROSGPT_Vision is shared with the community (link: https://github.com/bilel-bj/ROSGPT_Vision) to advance robotic research in this direction and to build more robotic frameworks that implement the PRM design pattern and enables controlling robots using only prompts.
Abstract（参考訳）: 本稿では,次世代ロボットは言語モデルのみのプロンプトで操作可能であることを論じる。各プロンプトは、そのモダリティ言語モデル(MLM)を介して、特定のロボットモダリティを個別に問う。中央タスクモダリティは、大きな言語モデル(LLM)を介してロボットミッションを実行するための通信全体を仲介する。本稿では,新しいロボットデザインパターンであるpromping robot modalities (prm) について述べる。さらに,このPRM設計パターンをROSGPT_Visionという新しいロボットフレームワークの構築に適用する。 ROSGPT_Visionは、ビジュアルプロンプトとLCMプロンプトの2つのプロンプトのみを使用して、ロボットタスクの実行を可能にする。視覚的なプロンプトは、自然言語において、検討中のタスク(視覚ロボットモダリティ)に関連する視覚的な意味的特徴を抽出する。一方、LLM Promptは視覚的記述(Task Modality)に対するロボット反応を規制している。このフレームワークは、2つのプロンプトの背後にあるすべてのメカニズムを自動化する。このフレームワークは、視覚データを処理し、情報的決定を行い、自動的にアクションを実行することで、複雑な現実世界のシナリオに対処することができる。このフレームワークは1つのジェネリックビジョンモジュールと2つの独立したROSノードから構成される。テストアプリケーションとして ROSGPT_Vision を用いたCarMate の開発を行った。 ROSGPT_Visionは従来の手法に比べて開発コストを大幅に削減した。我々は、プロンプト戦略を最適化することで、技術的な詳細を掘り下げることなく、アプリケーションの品質を改善する方法を示した。 ROSGPT_Visionはコミュニティ(リンク:https://github.com/bilel-bj/ROSGPT_Vision)と共有されており、この方向にロボットの研究を進め、PRMデザインパターンを実装し、プロンプトのみを使用してロボットを制御できるようにする。

関連論文リスト

TalkWithMachines: Enhancing Human-Robot Interaction for Interpretable Industrial Robotics Through Large/Vision Language Models [1.534667887016089]
本稿では,Large Language Models (LLMs) とVision Language Models (VLMs) の最近の進歩について検討する。この統合により、ロボットは自然言語で与えられたコマンドを理解し、実行し、視覚的および/または記述的な入力を通じて環境を認識することができる。本稿は、低レベル制御を探索するLLM支援型ロボット制御4つについて概説し、(ii)ロボットの内部状態を記述した言語に基づくフィードバックの生成、(iii)視覚情報の追加入力としての利用、(iv)タスク計画とフィードバックを生成するロボット構造情報の利用について述べる。
論文参考訳（メタデータ） (2024-12-19T23:43:40Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
本稿では,インターネット規模のセマンティック知識を継承するために,事前学習された視覚言語モデル(VLM)上に構築された新しいフローマッチングアーキテクチャを提案する。我々は,事前訓練後のタスクをゼロショットで実行し,人からの言語指導に追従し,微調整で新たなスキルを習得する能力の観点から,我々のモデルを評価した。
論文参考訳（メタデータ） (2024-10-31T17:22:30Z)
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
我々はLLaRA: Large Language and Robotics Assistantを紹介した。まず、既存の行動クローニングデータセットからロボットのための会話スタイルの指導データを生成する自動パイプラインを提案する。このようなデータセットを限定的に微調整したVLMは、ロボット制御において有意義な行動決定を導出できることを示す。
論文参考訳（メタデータ） (2024-06-28T17:59:12Z)
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
非専門家による直感的なロボットプログラミングのためのフレームワークを提案する。ロボットオペレーティングシステム(ROS)からの自然言語のプロンプトと文脈情報を活用する我々のシステムは,大規模言語モデル (LLM) を統合し,非専門家がチャットインタフェースを通じてシステムにタスク要求を記述できるようにする。
論文参考訳（メタデータ） (2024-06-28T08:28:38Z)
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning [50.99807031490589]
LLARVAは,ロボット学習タスク,シナリオ,環境を統一するための,新しい指導指導法で訓練されたモデルである。我々は,Open X-Embodimentデータセットから8.5Mの画像-視覚的トレースペアを生成し,モデルを事前学習する。実験によって強い性能が得られ、LLARVAは現代のいくつかのベースラインと比較してよく機能することを示した。
論文参考訳（メタデータ） (2024-06-17T17:55:29Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
本稿では,コード生成を利用したデプロイ可能なロボット操作パイプラインのためのプラットフォームである textbfRobotScript を提案する。自由形自然言語におけるロボット操作タスクのためのコード生成ベンチマークも提案する。我々は,Franka と UR5 のロボットアームを含む,複数のロボットエボディメントにまたがるコード生成フレームワークの適応性を実証した。
論文参考訳（メタデータ） (2024-02-22T15:12:00Z)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots [37.952398683031895]
中心となるアイデアは、ロボットの全体的な知性を高めることだ。本稿では,VLAモデルのファミリである Quadruped Robotic Transformer (QUART) を提案する。提案手法は,動作可能なロボットポリシーを導き,一貫した能力の獲得を可能にする。
論文参考訳（メタデータ） (2023-12-22T06:15:03Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
本稿では,最新のLarge Language Models(LLM)と既存のビジュアルグラウンドとロボットグルーピングシステムを統合する可能性について検討する。本稿では,この統合の例としてWALL-E (Embodied Robotic WAiter load lifting with Large Language model)を紹介する。我々は,このLCMを利用したシステムを物理ロボットに展開し,よりユーザフレンドリなインタフェースで指導誘導型把握タスクを実現する。
論文参考訳（メタデータ） (2023-08-30T11:35:21Z)
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [140.48218261864153]
本研究では,インターネット規模のデータに基づいて学習した視覚言語モデルを,エンドツーエンドのロボット制御に直接組み込む方法について検討する。提案手法は,インターネット規模のトレーニングから,RT-2による創発的能力の獲得を可能にした。
論文参考訳（メタデータ） (2023-07-28T21:18:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。