Fugu-MT 論文翻訳(概要): Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations

論文の概要: Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations

arxiv url: http://arxiv.org/abs/2509.18953v1
Date: Tue, 23 Sep 2025 13:02:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-24 20:41:27.848167
Title: Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations
Title（参考訳）: Eva-VLA:実世界の物理的変動下における視覚・言語・行動モデルのロバストさの評価
Authors: Hanqing Liu, Jiahuan Long, Junqi Wu, Jiacheng Hou, Huili Tang, Tingsong Jiang, Weien Zhou, Wen Yao,
Abstract要約: VLA(Vision-Language-Action)モデルは、ロボット操作のための有望なソリューションとして登場したが、現実世界の物理的変動に対する堅牢性は、いまだに過小評価されていない。本稿では,離散的な物理変動を連続最適化問題に変換することで,VLAモデルのロバスト性を体系的に評価する最初の統一フレームワークであるEva-VLAを提案する。 Eva-VLAフレームワークは,VLAベースのロボット操作モデルを現実の展開課題に対して強化するための実用的な経路を提供する。
参考スコア（独自算出の注目度）: 20.05530136820015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language-Action (VLA) models have emerged as promising solutions for robotic manipulation, yet their robustness to real-world physical variations remains critically underexplored. To bridge this gap, we propose Eva-VLA, the first unified framework that systematically evaluates the robustness of VLA models by transforming discrete physical variations into continuous optimization problems. However, comprehensively assessing VLA robustness presents two key challenges: (1) how to systematically characterize diverse physical variations encountered in real-world deployments while maintaining evaluation reproducibility, and (2) how to discover worst-case scenarios without prohibitive real-world data collection costs efficiently. To address the first challenge, we decompose real-world variations into three critical domains: object 3D transformations that affect spatial reasoning, illumination variations that challenge visual perception, and adversarial patches that disrupt scene understanding. For the second challenge, we introduce a continuous black-box optimization framework that transforms discrete physical variations into parameter optimization, enabling systematic exploration of worst-case scenarios. Extensive experiments on state-of-the-art OpenVLA models across multiple benchmarks reveal alarming vulnerabilities: all variation types trigger failure rates exceeding 60%, with object transformations causing up to 97.8% failure in long-horizon tasks. Our findings expose critical gaps between controlled laboratory success and unpredictable deployment readiness, while the Eva-VLA framework provides a practical pathway for hardening VLA-based robotic manipulation models against real-world deployment challenges.
Abstract（参考訳）: VLA(Vision-Language-Action)モデルは、ロボット操作のための有望なソリューションとして登場したが、現実世界の物理的変動に対する堅牢性は、いまだに過小評価されていない。このギャップを埋めるために,離散的な物理変動を連続最適化問題に変換することで,VLAモデルのロバスト性を体系的に評価する最初の統一フレームワークであるEva-VLAを提案する。しかしながら,VLAのロバスト性を総合的に評価することは,(1) 再現性を維持しつつ,実世界の展開で発生する多様な物理的変動を体系的に特徴づける方法,(2) 実世界のデータ収集を効果的に禁止せずに最悪のシナリオを発見する方法,の2つの課題を提示する。最初の課題に対処するために、現実世界の変動を3つの重要な領域に分解する:空間的推論に影響を与えるオブジェクト3D変換、視覚的知覚に挑戦する照明変化、シーン理解を阻害する敵パッチ。 2つ目の課題として、離散的な物理変動をパラメータ最適化に変換する連続的なブラックボックス最適化フレームワークを導入し、最悪のシナリオの体系的な探索を可能にする。複数のベンチマークにまたがる最先端のOpenVLAモデルに関する大規模な実験では、すべてのバリエーションタイプが障害率を60%を超え、オブジェクト変換が97.8%の障害を引き起こすという、重大な脆弱性が明らかになった。 Eva-VLAフレームワークは,VLAベースのロボット操作モデルを現実の展開課題に対して強化するための実用的な経路を提供する。

論文の概要: Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations

関連論文リスト