Fugu-MT 論文翻訳(概要): Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops

論文の概要: Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops

arxiv url: http://arxiv.org/abs/2512.04446v1
Date: Thu, 04 Dec 2025 04:36:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.12141
Title: Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops
Title（参考訳）: 選択型ロボット分解のための視覚言語行動モデル:デスクトップからの臨界成分抽出を事例として
Authors: Chang Liu, Sibo Tian, Sara Behdad, Xiao Liang, Minghui Zheng,
Abstract要約: RAMモジュールやCPUのような高価値のアイテムや、ハードディスクドライブのようなセンシティブなパーツは、シーケンシャルで正確で巧妙な操作を必要とします。近年の視覚言語アクション(VLA)モデルの開発は、一般的なロボット操作タスクに対するエンドツーエンドのアプローチを提示している。本稿では、ロボットRAMとCPUの分解のためのカスタマイズデータセットを収集し、それを2つの確立されたVLAアプローチの微調整に利用した。
参考スコア（独自算出の注目度）: 5.567801088767209
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.
Abstract（参考訳）: エンド・オブ・ライフ(EoL)デスクトップから、RAMモジュールやCPUのような高価値なアイテムやハードディスクドライブのような機密性の高い部品などの重要なコンポーネントを分解する自動化は、これらの製品の本質的な多様性と不確実性のため、依然として困難である。さらに、それらの分解はシーケンシャルで正確で巧妙な操作を必要とし、自動化の複雑さをさらに高めます。現在のロボット分解プロセスは通常、知覚、シーケンス計画、タスク計画、動作計画、操作の2つの段階に分けられる。各ステージは明示的なモデリングを必要とし、一般化は馴染みの無いシナリオに制限される。近年の視覚言語アクション(VLA)モデルの開発は、一般的なロボット操作タスクに対するエンドツーエンドのアプローチを提示している。 VLAは単純なタスクで有望な性能を示したが、そのようなモデルを複雑な分解に応用する可能性はほとんど未解明のままである。本稿では、ロボットRAMとCPUの分解のためのカスタマイズデータセットを収集し、2つの確立されたVLAアプローチ、OpenVLAとOpenVLA-OFTをケーススタディとして使用した。我々は, 分解タスク全体をいくつかの小さなステップに分割し, 予備的な実験結果から, 微調整されたVLAモデルが複数の早期ステップを忠実に完了できるが, 特定の臨界サブタスクに難航し, タスクの失敗を招いたことが示唆された。しかし、VLAとルールベースのコントローラを組み合わせた単純なハイブリッド戦略が、分解操作全体の実行に成功していることがわかった。これらの結果は、ロボットEoL製品の分解に必要なデキスタリティと精度を扱う上でのVLAモデルの現在の限界を浮き彫りにしている。この研究は、観測結果の詳細な分析を提供することで、現在の課題に対処し、エンドツーエンドのロボット自動分解を前進させるために、将来の研究に影響を及ぼす可能性のある洞察を提供する。

論文の概要: Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops

関連論文リスト