Fugu-MT 論文翻訳(概要): ScreenSearch: Uncertainty-Aware OS Exploration

論文の概要: ScreenSearch: Uncertainty-Aware OS Exploration

arxiv url: http://arxiv.org/abs/2605.16024v1
Date: Fri, 15 May 2026 14:58:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:26.327786
Title: ScreenSearch: Uncertainty-Aware OS Exploration
Title（参考訳）: ScreenSearch:不確実なOS探索
Authors: Michael Solodko, Justin Wagle,
Abstract要約: ScreenSearchは、構造画面の検索と重複をあいまいさを意識したPUCTグラフバンドと組み合わせて大規模なデスクトップ探索を行うシステムである。我々は、この信号とフロンティア報酬を用いて、共有グラフ上で大規模な探索と再生開始ポリシー評価を促進する。 11のデスクトップアプリケーション全体で、ScreenSearchは100万以上のスクリーンショットと30万以上の重複状態を収集し、大規模な探索コーパスを生成する。
参考スコア（独自算出の注目度）: 0.9310318514564272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Desktop GUI agents operate under partial observability: visually similar screens can correspond to different underlying workflow states, so locally plausible actions can lead to sharply different outcomes. We frame this as a problem of computer/OS state exploration, where effective behavior requires both expanding the reachable frontier and reducing ambiguity before committing. We present ScreenSearch, a system that combines structural screen retrieval and deduplication with an ambiguity-aware PUCT graph-bandit for large-scale desktop exploration. The retrieval layer converts UIA trees into location-aware structural features, indexes related screens through sparse token search and metadata filters, and maintains a shared deduplicated state graph across VM workers. On top of this graph, we define a scalable ambiguity signal based on matched-action outcome dispersion. If similar screens produce different next states under the same action signature, the state should be probed further rather than treated as resolved. We use this signal together with frontier rewards to drive large-scale exploration and replay-start policy evaluation over the shared graph. Across 11 desktop applications, ScreenSearch collects over 1M screenshots and over 30K deduplicated states, yielding large exploration corpora with substantial cross-application and within-application diversity. On a fixed replay-start slice, we observe a clear novelty--ambiguity trade-off: some policies reduce ambiguity quickly while discovering little frontier. Ambiguity reduction alone is therefore not a sufficient exploration objective. Appendix ablations show that stronger proposal priors can materially improve unique-state discovery during corpus building. These results suggest that state identity, proposal quality, and ambiguity-aware search all matter when deciding when to probe and when to commit.
Abstract（参考訳）: 視覚的に類似したスクリーンは、異なるワークフロー状態に対応できるため、局所的に妥当なアクションは、はっきりと異なる結果をもたらす可能性がある。我々はこれをコンピュータ/OS状態探索の問題とみなし、効果的行動には、到達可能なフロンティアの拡大とコミット前のあいまいさの軽減の両方が必要である。 ScreenSearchは、構造画面の検索と重複をあいまいさを意識したPUCTグラフバンドと組み合わせて大規模なデスクトップ探索を行うシステムである。検索層はUIAツリーを位置対応の構造特徴に変換し、スパーストークン検索とメタデータフィルタを通じて関連画面をインデックスし、VMワーカ間で共有された重複状態グラフを保持する。このグラフの上に、一致した動作結果の分散に基づいて、スケーラブルなあいまいさ信号を定義する。類似のスクリーンが同じアクションシグネチャの下で異なる次の状態を生成する場合、状態は解決されるように扱われるのではなく、さらに調査されるべきである。我々は、この信号とフロンティア報酬を用いて、共有グラフ上で大規模な探索と再生開始ポリシー評価を促進する。 11のデスクトップアプリケーション全体で、ScreenSearchは100万以上のスクリーンショットと30万以上の重複状態を収集し、大規模な探索コーパスを生成し、アプリケーション間の相互運用とアプリケーション内部の多様性を実現している。固定されたリプレイ開始スライスでは、明確な新規性-曖昧性トレードオフが観察される。したがって、曖昧さの低減だけでは十分な探索目標にはならない。 Appendix ablationsは、より強力な提案がコーパスビルディング中の一意な状態発見を大幅に改善できることを示している。これらの結果は、状態の同一性、提案品質、あいまいさを意識した検索が、いつ、いつ、いつ、コミットするかを決めるときに重要であることを示唆している。

論文の概要: ScreenSearch: Uncertainty-Aware OS Exploration

関連論文リスト