Fugu-MT 論文翻訳(概要): VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

論文の概要: VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

arxiv url: http://arxiv.org/abs/2605.02037v1
Date: Sun, 03 May 2026 20:04:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.05424
Title: VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation
Title（参考訳）: VILAS:ロボットマニピュレーションのためのソフトグラスピングによるVLA内蔵の低コストアーキテクチャ
Authors: Zijian An, Hadi Khezam, Bill Cai, Ran Yang, Shijie Geng, Yiming Feng, Yue, Zheng, Lifeng Zhou,
Abstract要約: 我々は、VLA(Vision-Language-action)ポリシー学習と、アクセス可能なハードウェアへのデプロイをサポートするために設計された、完全に低コストでモジュール化されたロボット操作プラットフォームであるVILASを紹介する。このシステムには、Fairinoのコラボレーティブアーム、Jodell RG52-50電気グリップ、デュアルカメラ認識モジュールが組み込まれている。力覚に頼らずに壊れやすい物体の安全な操作を可能にするため,キリガミをベースとしたソフト適合グリップ拡張を設計する。
参考スコア（独自算出の注目度）: 16.699685476760408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.
Abstract（参考訳）: VLA(End-to-end Vision-Language-action)ポリシ学習と,アクセス可能なハードウェアへのデプロイをサポートするために設計された,完全に低コストでモジュール化されたロボット操作プラットフォームであるVILASを紹介する。このシステムは、Fairino FR5コラボレーティブアーム、Jodell RG52-50電気グリップ、デュアルカメラ認識モジュールを統合し、単一のフレームワーク内で遠隔操作、データ収集、ポリシー展開をシームレスに調整するZMQベースの通信アーキテクチャを通じて統合されている。そこで我々は, 圧縮荷重下での予測可能な変形を誘導し, 微妙な目標との軽快かつ反復的な接触を可能にする, キリガミをベースとしたソフトコンプライアンスグリップを設計した。我々は、VILASプラットフォーム上で3つの最先端VLAモデル(pi_0, pi_0.5, GR00T N1.6)をデプロイし、評価した。すべてのモデルは、我々の遠隔操作パイプラインを介して収集された同じデモデータセットを使用して、公開済みのチェックポイントから微調整される。グレープ把握タスクの実験により提案方式の有効性が検証され, 低コストなモジュールハードウェア上で, 動作可能な操作ポリシーをうまく訓練し, 展開できることが確認された。さらに,実環境における現在のVLAモデルの展開特性について,実際の知見を提供する。

論文の概要: VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

関連論文リスト