Fugu-MT 論文翻訳(概要): Do Phone-Use Agents Respect Your Privacy?

論文の概要: Do Phone-Use Agents Respect Your Privacy?

arxiv url: http://arxiv.org/abs/2604.00986v1
Date: Wed, 01 Apr 2026 14:50:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:32.043651
Title: Do Phone-Use Agents Respect Your Privacy?
Title（参考訳）: 電話使用エージェントはあなたのプライバシーを尊重するのか?
Authors: Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye, Xinyuan Wang, Yiduo Guo, Ziniu Li, Chenxin Li, Jingyuan Hu, Shunian Chen, Tongxu Luo, Jiaxi Bi, Zeyu Qin, Shaobo Wang, Xin Lai, Pengyuan Lyu, Junyi Li, Can Xu, Chengquan Zhang, Han Hu, Ming Yan, Benyou Wang,
Abstract要約: 我々は,モバイルエージェントのプライバシ行動を評価するための検証可能なフレームワークであるMyPhoneBenchを紹介する。プライバシを無視する電話を、最小限のプライバシ契約によって許可されたアクセス、最小限の開示、およびユーザ制御メモリとして運用する。 10のモバイルアプリと300のタスクで5つのフロンティアモデルにまたがって、タスクの成功、プライバシに準拠したタスク補完、保存された好みの後での利用が、それぞれ異なる機能であることに気付きました。
参考スコア（独自算出の注目度）: 97.81424230136075
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/tangzhy/MyPhoneBench.
Abstract（参考訳）: 携帯電話利用エージェントがモバイルタスクを完了しながらプライバシを尊重するかどうかを調査する。プライバシに準拠する動作は電話使用エージェントには運用されないため、通常のアプリは実行中にどのデータエージェントがどのフォームエントリにタイプするかを正確に明らかにしていないため、この問題は依然として解決が難しい。この疑問を計測するために、モバイルエージェントのプライバシー行動を評価するための検証可能なフレームワークであるMyPhoneBenchを紹介した。我々は、最小限のプライバシー契約であるiMyを通じて、パーミッションアクセス、最小限の開示、ユーザー制御メモリとしてプライバシを運用し、不要なパーミッション要求、偽装再開示、不要なフォームを保存可能で再現可能な形で埋める、モックアプリとルールベースの監査を組み合わせます。 10のモバイルアプリと300のタスクで5つのフロンティアモデルにまたがって、タスクの成功、プライバシに準拠したタスク補完、保存された好みの後での利用は異なる機能であり、単一のモデルが3つすべてを支配しているわけではないことが分かりました。成功とプライバシを評価することは、どちらの指標のみに対するモデルの順序付けを共同で改善する。モデル間で最も永続的な障害モードは、単純なデータ最小化である。これらの結果は、プライバシーの欠陥は、良心的タスクの過剰な実行から発生し、成功のみの評価は、現在の電話利用エージェントの展開準備度を過大評価していることを示している。すべてのコード、モックアプリ、エージェントのトラジェクトリはhttps://github.com/tangzhy/MyPhoneBench.comで公開されている。

論文の概要: Do Phone-Use Agents Respect Your Privacy?

関連論文リスト