Related papers: PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning

PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning

URL: http://arxiv.org/abs/2601.11957v1
Date: Sat, 17 Jan 2026 08:19:18 GMT
Title: PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning
Authors: Bingxuan Li, Jeonghwan Kim, Cheng Qian, Xiusi Chen, Eitan Anzenberg, Niran Kundapur, Heng Ji,
Abstract summary: We propose PEARL, a reinforcement-learning framework that augments language agent with an external memory module and optimized round-wise reward design.<n>Experiments on CalBench show that PEARL achieves 0.76 error reduction rate, and 55% in average error rate compared to the strongest baseline.
Score: 50.81994347448835
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Overlapping calendar invitations force busy professionals to repeatedly decide which meetings to attend, reschedule, or decline. We refer to this preference-driven decision process as calendar conflict resolution. Automating such process is crucial yet challenging. Scheduling logistics drain hours, and human delegation often fails at scale, which motivate we to ask: Can we trust large language model (LLM) or language agent to manager time? To enable systematic study of this question, we introduce CalConflictBench, a benchmark for long-horizon calendar conflict resolution. Conflicts are presented sequentially and agents receive feedback after each round, requiring them to infer and adapt to user preferences progressively. Our experiments show that current LLM agents perform poorly with high error rates, e.g., Qwen-3-30B-Think has 35% average error rate. To address this gap, we propose PEARL, a reinforcement-learning framework that augments language agent with an external memory module and optimized round-wise reward design, enabling agent to progressively infer and adapt to user preferences on-the-fly. Experiments on CalConflictBench shows that PEARL achieves 0.76 error reduction rate, and 55% improvement in average error rate compared to the strongest baseline.

Related papers

SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models [7.437301045895224]
We introduce SPAN, a cross-calendar temporal reasoning benchmark.<n>SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars.<n>To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation.
arXiv Detail & Related papers (2025-11-13T05:57:19Z)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z)
Reinforcement Learning for Machine Learning Engineering Agents [52.03168614623642]
We show that agents backed by weaker models that improve via reinforcement learning can outperform agents backed by much larger, but static models.<n>We propose duration- aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions.<n>We also propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early.
arXiv Detail & Related papers (2025-09-01T18:04:10Z)
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models [29.95434387343843]
We propose a unified framework that mitigates length bias through three components.<n>CoLD consistently reduces reward-length correlation, improves accuracy in step selection, and encourages more concise, logically valid reasoning.
arXiv Detail & Related papers (2025-07-21T15:07:59Z)
RIFLES: Resource-effIcient Federated LEarning via Scheduling [4.358456799125694]
Federated Learning (FL) is a privacy-preserving machine learning technique that allows decentralized collaborative model training across a set of distributed clients.<n>Current selection strategies are myopic in nature in that they are based on past or current interactions.<n>RIFLES builds a novel availability forecasting layer to support the client selection process.
arXiv Detail & Related papers (2025-05-19T14:26:33Z)
Program Synthesis Dialog Agents for Interactive Decision-Making [16.916736716463284]
We propose BeNYfits, a new benchmark for determining user eligibility for social benefits opportunities through interactive decision-making.<n>Our experiments show that GPT-4o scoring only 35.7 F1 using a ReAct-style chain-of-thought.<n>Our agent, ProADA, improves the F1 score to 55.6 while maintaining nearly the same number of dialog turns.
arXiv Detail & Related papers (2025-02-26T22:53:01Z)
Self-Consistency Preference Optimization [79.37880123635405]
We introduce self-consistency preference optimization (ScPO)<n>ScPO iteratively trains consistent answers to be preferred over inconsistent ones on unsupervised new problems.<n>On ZebraLogic, ScPO fine Llamatunes-3 8B to be superior to Llama-3 70B, Gemma-2 27B, and Claude-3 Haiku.
arXiv Detail & Related papers (2024-11-06T18:36:22Z)
Direct Multi-Turn Preference Optimization for Language Agents [44.02877245158347]
Adapting Large Language Models (LLMs) for agent tasks is critical in developing language agents.<n>Direct Preference Optimization (DPO) is a promising technique for this adaptation with the alleviation of compounding errors.<n>Applying DPO to multi-turn tasks presents challenges due to the inability to cancel the partition function.
arXiv Detail & Related papers (2024-06-21T05:13:20Z)
Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior. In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.