Fugu-MT 論文翻訳(概要): TreeMind: Automatically Reproducing Android Bug Reports via LLM-empowered Monte Carlo Tree Search

論文の概要: TreeMind: Automatically Reproducing Android Bug Reports via LLM-empowered Monte Carlo Tree Search

arxiv url: http://arxiv.org/abs/2509.22431v1
Date: Fri, 26 Sep 2025 14:50:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.517942
Title: TreeMind: Automatically Reproducing Android Bug Reports via LLM-empowered Monte Carlo Tree Search
Title（参考訳）: TreeMind: LLMを利用したモンテカルロ木検索によるAndroidバグレポートの自動再生
Authors: Zhengyu Chen, Zhaoyi Meng, Wenxiang Zhao, Wansen Wang, Haoyang Zhao, Jiahao Zhan, Jie Cui, Hong Zhong,
Abstract要約: そこで我々は,大規模言語モデルとモンテカルロ木探索アルゴリズムを統合し,バグ再現における戦略的UI探索を実現する新しい手法であるTreeMindを提案する。私たちの知る限りでは、これは、外部の意思決定とセマンティック推論を組み合わせた、信頼性のあるバグ再現のための最初の作業です。広範に使用されている3つのベンチマークから、実世界の93のAndroidバグレポートのデータセットに基づいて、TreeMindを評価した。実験の結果、再現成功率の4つの最先端のベースラインを著しく上回っていることが示された。
参考スコア（独自算出の注目度）: 24.23102808875548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatically reproducing Android app crashes from textual bug reports is challenging, particularly when the reports are incomplete and the modern UI exhibits high combinatorial complexity. Existing approaches based on reinforcement learning or large language models (LLMs) exhibit limitations in such scenarios. They struggle to infer unobserved steps and reconstruct the underlying user action sequences to navigate the vast UI interaction space, primarily due to limited goal-directed reasoning and planning. We present TreeMind, a novel technique that integrates LLMs with a customized Monte Carlo Tree Search (MCTS) algorithm to achieve strategic UI exploration in bug reproduction. To the best of our knowledge, this is the first work to combine external decision-making with LLM semantic reasoning for reliable bug reproduction. We formulate the reproduction task as a target-driven search problem, leveraging MCTS as the core planning mechanism to iteratively refine action sequences. To enhance MCTS with semantic reasoning, we introduce two LLM-guided agents with distinct roles: Expander generates top-k promising actions based on the current UI state and exploration history, while Simulator estimates the likelihood that each action leads toward successful reproduction. By incorporating multi-modal UI inputs and advanced prompting techniques, TreeMind conducts feedback-aware navigation that identifies missing but essential user actions and incrementally reconstructs the reproduction paths. We evaluate TreeMind on a dataset of 93 real-world Android bug reports from three widely-used benchmarks. Experimental results show that it significantly outperforms four state-of-the-art baselines in reproduction success rate. A real-world case study indicates that integrating LLM reasoning with MCTS-based planning is a compelling direction for automated bug reproduction.
Abstract（参考訳）: テキストによるバグ報告から自動でAndroidアプリのクラッシュを再現することは、特にレポートが不完全で、モダンなUIが高い組合せ複雑性を示す場合、難しい。強化学習や大規模言語モデル(LLM)に基づく既存のアプローチでは、そのようなシナリオに制限がある。彼らは、未観測のステップを推測し、基盤となるユーザーアクションシーケンスを再構築して、大きなUIインタラクション空間をナビゲートするのに苦労している。我々は,LLMとMCTSアルゴリズムを統合し,バグ再現における戦略的UI探索を実現する新しい手法であるTreeMindを提案する。我々の知る限りでは、これは、信頼性のあるバグ再現のために、外部決定とLLMの意味論的推論を組み合わせる最初の作業である。本稿では,MCTSをコアプランニング機構として活用し,反復的に動作シーケンスを改良する目的探索問題として再生タスクを定式化する。 Expanderは、現在のUI状態と探索履歴に基づいて、トップkの有望なアクションを生成し、Simulatorは、各アクションが再現を成功に導く可能性を推定する。マルチモーダルUI入力と高度なプロンプト技術を導入することで、TreeMindは、欠如しているが不可欠なユーザアクションを特定し、再現パスを漸進的に再構築するフィードバック対応ナビゲーションを実行する。 TreeMindを,広く使用されている3つのベンチマークから,実世界の93のAndroidバグレポートのデータセットで評価した。実験の結果, 再現成功率において, 最先端の4つのベースラインを著しく上回っていることが明らかとなった。実世界のケーススタディでは、LCM推論とMCTSベースの計画を統合することが、自動バグ再現にとって魅力的な方向であることを示唆している。

論文の概要: TreeMind: Automatically Reproducing Android Bug Reports via LLM-empowered Monte Carlo Tree Search

関連論文リスト