Fugu-MT 論文翻訳(概要): OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

論文の概要: OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

arxiv url: http://arxiv.org/abs/2604.15093v1
Date: Thu, 16 Apr 2026 14:53:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:31.958598
Title: OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
Title（参考訳）: OpenMobile: タスクと軌道合成によるオープンモバイルエージェントの構築
Authors: Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, Anh Tuan Luu, Jianbing Zhang, Lewei Lu, Dahua Lin,
Abstract要約: 高品質なタスク命令とエージェントトラジェクトリを合成するオープンソースフレームワークであるOpenMobileについて述べる。データに基づいてトレーニングされたエージェントは、3つの動的モバイルエージェントベンチマークで競合する結果を得る。
参考スコア（独自算出の注目度）: 98.43366988856592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.
Abstract（参考訳）: ビジョン言語モデルを活用したモバイルエージェントは、モバイルタスクの自動化において、印象的な機能を示し、最近の主要なモデルでは、AndroidWorldで70%近い成功を収めている。しかし、これらのシステムはトレーニングデータをクローズし、タスクや軌道合成のレシピについて不透明なままに保ちます。高品質なタスク命令とエージェントトラジェクトリを合成するオープンソースフレームワークであるOpenMobileについて,(1)グローバル環境メモリを探索から構築するスケーラブルなタスク合成パイプラインを用いて,多種多様な基底命令を生成する。および(2)軌道展開のための方針変更戦略。学習者と専門家のモデルを交互に組み合わせることで、標準的な模倣学習でしばしば欠落する重要なエラー回復データをキャプチャする。データに基づいてトレーニングされたエージェントは、3つの動的モバイルエージェントベンチマークの競合的な結果を達成する。特に、私たちの微調整されたQwen2.5-VLとQwen3-VLは、AndroidWorldで51.7%、64.7%に達し、既存のオープンデータアプローチをはるかに上回っている。さらに、合成命令とベンチマークテストセットの重複を透過的に分析し、ベンチマークオーバーフィッティングよりも広範な機能カバレッジによる性能向上を検証した。データギャップを埋め、より広範なモバイルエージェントの研究を促進するために、https://njucckevin.github.io/openmobile/でデータとコードをリリースします。

論文の概要: OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

関連論文リスト