Fugu-MT 論文翻訳(概要): PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning

論文の概要: PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2601.11957v1
Date: Sat, 17 Jan 2026 08:19:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-21 22:47:22.39524
Title: PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning
Title（参考訳）: PEARL:強化学習による時間管理のための自己進化型アシスタント
Authors: Bingxuan Li, Jeonghwan Kim, Cheng Qian, Xiusi Chen, Eitan Anzenberg, Niran Kundapur, Heng Ji,
Abstract要約: 本稿では,言語エージェントを外部メモリモジュールで拡張し,ラウンドワイド報酬設計を最適化した強化学習フレームワークであるPEARLを提案する。 CalBenchの実験では、PEARLは最強のベースラインに比べて平均エラー率0.76、平均エラー率55%を達成した。
参考スコア（独自算出の注目度）: 50.81994347448835
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Overlapping calendar invitations force busy professionals to repeatedly decide which meetings to attend, reschedule, or decline. We refer to this preference-driven decision process as calendar conflict resolution. Automating such process is crucial yet challenging. Scheduling logistics drain hours, and human delegation often fails at scale, which motivate we to ask: Can we trust large language model (LLM) or language agent to manager time? To enable systematic study of this question, we introduce CalConflictBench, a benchmark for long-horizon calendar conflict resolution. Conflicts are presented sequentially and agents receive feedback after each round, requiring them to infer and adapt to user preferences progressively. Our experiments show that current LLM agents perform poorly with high error rates, e.g., Qwen-3-30B-Think has 35% average error rate. To address this gap, we propose PEARL, a reinforcement-learning framework that augments language agent with an external memory module and optimized round-wise reward design, enabling agent to progressively infer and adapt to user preferences on-the-fly. Experiments on CalConflictBench shows that PEARL achieves 0.76 error reduction rate, and 55% improvement in average error rate compared to the strongest baseline.
Abstract（参考訳）: 重複するカレンダーの招待状は、忙しい専門家に、どの会議に出席するか、再会するか、あるいは辞退するかを決定させます。我々は、この選好駆動決定プロセスをカレンダーコンフリクト解決と呼ぶ。このようなプロセスの自動化は不可欠だが、難しい。大規模言語モデル(LLM)や言語エージェントを管理時間として信頼できますか? この問題を体系的に研究するために,長期カレンダー競合解消のためのベンチマークであるCalConflictBenchを紹介する。コンフリクトは順次提示され、エージェントは各ラウンド後にフィードバックを受け、ユーザーの好みを徐々に推測し、適応させる必要がある。実験の結果,現在のLLMエージェントは,Qwen-3-30B-Thinkの平均エラーレートが35%である場合,高いエラーレートで性能が良くないことがわかった。このギャップに対処するために,言語エージェントを外部メモリモジュールで拡張し,ラウンドワイド報酬設計を最適化した強化学習フレームワークであるPEARLを提案する。 CalConflictBenchの実験では、PEARLは最強のベースラインに比べて平均エラー率を0.76、平均エラー率を55%改善している。

論文の概要: PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning

関連論文リスト