Fugu-MT 論文翻訳(概要): DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

論文の概要: DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

arxiv url: http://arxiv.org/abs/2606.18191v1
Date: Tue, 16 Jun 2026 17:22:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.573608
Title: DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction
Title（参考訳）: DRFLOW: パーソナライズされたワークフロー予測のためのディープリサーチベンチマーク
Authors: Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Issam H. Laradji,
Abstract要約: 多くのエンタープライズタスクは、アクションステップのシーケンスであるコンクリートを特定するためにエージェントを必要とします。異種ソースからのパーソナライズされた予測を評価するためのベンチマークであるDRFLOWを紹介する。 DRFLOWには5つのドメインに100のタスクがあり、3900以上のソースに1,246の参照ワークフローステップがある。
参考スコア（独自算出の注目度）: 44.59825034567626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, rather than summarizing budgeting policies, an agent should be able to determine the steps needed to answer a question such as: "How do I request new headcount given a fixed budget?". Therefore, we introduce DRFLOW, a benchmark for evaluating personalized workflows predicted by agents from heterogeneous sources. Each task requires the agent to identify relevant evidence from scattered sources, then use that evidence to predict the correct action-step sequence for the user's task. DRFLOW contains 100 tasks across five domains, with 1,246 reference workflow steps grounded in more than 3,900 sources. We define seven diagnostic metrics covering factual grounding, step recovery, structural ordering, condition resolution, and personalization. We further present DRFLOW-Agent (DRFA), a workflow-oriented reference agent to predict personalized workflow. We show that although DRFA improves over strong baseline agents (upto 10.02% average F1 score), there is substantial room for improvement remains across these workflow metrics, indicating that predicting complete and correct personalized workflows remains a challenging frontier for deep research.
Abstract（参考訳）: ディープリサーチ(Dep Research, DR)は、複雑な情報検索タスクに使用されることが多いが、既存の研究は主にレポートや要約の生成に重点を置いている。対照的に、多くのエンタープライズタスクは、アクションステップのシーケンスである具体的なワークフローを特定するためにエージェントを必要とします。例えば、予算政策を要約する代わりに、エージェントは「固定予算が与えられた新しい責任者をどうやって要求するか」といった質問に答えるために必要なステップを決定することができるべきである。そこで本研究では,異種情報源のエージェントによって予測されるパーソナライズされたワークフローを評価するためのベンチマークであるDRFLOWを紹介する。各タスクは、エージェントが散在するソースから関連する証拠を識別し、その証拠を使用してユーザのタスクの正しいアクションステップシーケンスを予測する必要がある。 DRFLOWには5つのドメインに100のタスクがあり、3900以上のソースに1,246の参照ワークフローステップがある。実地調査,ステップ回復,構造秩序,条件解決,パーソナライゼーションを含む7つの診断指標を定義した。さらに、ワークフロー指向の参照エージェントであるDRFLOW-Agent(DRFA)を紹介し、パーソナライズされたワークフローを予測する。 DRFAは強力なベースラインエージェント(平均F1スコアは10.02%まで)よりも改善されているが、これらのワークフローのメトリクスには改善の余地が残っており、完全なパーソナライズされたワークフローの予測が深い研究のフロンティアであることを示している。

論文の概要: DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

関連論文リスト