Fugu-MT 論文翻訳(概要): Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

論文の概要: Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

arxiv url: http://arxiv.org/abs/2604.05595v1
Date: Tue, 07 Apr 2026 08:43:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.721852
Title: Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming
Title（参考訳）: 多様性を考慮したレッドチームによる視覚・言語・行動モデルにおける言語的脆弱性の発見
Authors: Baoshun Tong, Haoran He, Ling Pan, Yang Liu, Liang Lin,
Abstract要約: 本稿では,VLA(Vision-Language-Action)モデルの言語的変異に対する脆弱性を明らかにするための新しいフレームワークを提案する。本手法は, ストレス試験用VLAエージェントへのスケーラブルなアプローチを示すため, 平均作業成功率を93.33%から5.85%に下げる。
参考スコア（独自算出の注目度）: 64.48633529149579
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains a critical, under-explored safety concern, posing a significant safety risk to real-world deployment. Red teaming, or identifying environmental scenarios that elicit catastrophic behaviors, is an important step in ensuring the safe deployment of embodied AI agents. Reinforcement learning (RL) has emerged as a promising approach in automated red teaming that aims to uncover these vulnerabilities. However, standard RL-based adversaries often suffer from severe mode collapse due to their reward-maximizing nature, which tends to converge to a narrow set of trivial or repetitive failure patterns, failing to reveal the comprehensive landscape of meaningful risks. To bridge this gap, we propose a novel \textbf{D}iversity-\textbf{A}ware \textbf{E}mbodied \textbf{R}ed \textbf{T}eaming (\textbf{DAERT}) framework, to expose the vulnerabilities of VLAs against linguistic variations. Our design is based on evaluating a uniform policy, which is able to generate a diverse set of challenging instructions while ensuring its attack effectiveness, measured by execution failures in a physical simulator. We conduct extensive experiments across different robotic benchmarks against two state-of-the-art VLAs, including $π_0$ and OpenVLA. Our method consistently discovers a wider range of more effective adversarial instructions that reduce the average task success rate from 93.33\% to 5.85\%, demonstrating a scalable approach to stress-testing VLA agents and exposing critical safety blind spots before real-world deployment.
Abstract（参考訳）: VLA(Vision-Language-Action)モデルは、ロボット操作において顕著な成功を収めた。しかし、言語的ニュアンスに対する堅牢性は、いまだに重要であり、探索されていない安全上の懸念であり、現実世界の展開に重大な安全リスクをもたらす。破滅的な行動を引き起こす環境シナリオをレッドチームで特定することは、エンボディされたAIエージェントの安全な配置を保証するための重要なステップである。強化学習(RL)は、これらの脆弱性を明らかにすることを目的とした、自動化されたレッドチームにおける有望なアプローチとして登場した。しかし、標準的なRLベースの敵は、報酬を最大化する性質のため、しばしば深刻なモード崩壊に悩まされる。このギャップを埋めるために、VLAの脆弱性を言語的変動に対して露呈するために、新しい \textbf{D}iversity-\textbf{A}ware \textbf{E}mbodied \textbf{R}ed \textbf{T}eaming (\textbf{DAERT}) フレームワークを提案する。本設計は,物理シミュレータにおける実行障害によって測定された攻撃効率を確保しつつ,多種多様な挑戦的命令を生成できる一様ポリシーの評価に基づく。我々は、π_0$とOpenVLAを含む2つの最先端VLAに対して、さまざまなロボットベンチマークで広範な実験を行う。提案手法は,平均タスク成功率を93.33\%から5.85\%に削減し,ストレステストを行うVLAエージェントに対するスケーラブルなアプローチを実証し,実際の展開前に重要な安全盲点を明らかにする。

論文の概要: Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

関連論文リスト