The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting
- URL: http://arxiv.org/abs/2403.10946v2
- Date: Thu, 24 Oct 2024 20:04:43 GMT
- Title: The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting
- Authors: Ziping Xu, Kelly W. Zhang, Susan A. Murphy,
- Abstract summary: In real-world RL applications, human-in-the-loop decisions between tasks often results in non-stationarity.
Our results show that task non-stationarity leads to a more restrictive trade-off between cumulative regret (CR) and simple regret (SR)
These findings are practically significant, indicating that increased exploration is necessary in non-stationary environments to accommodate task changes.
- Score: 11.834850394160608
- License:
- Abstract: Online Reinforcement Learning (RL) is typically framed as the process of minimizing cumulative regret (CR) through interactions with an unknown environment. However, real-world RL applications usually involve a sequence of tasks, and the data collected in the first task is used to warm-start the second task. The performance of the warm-start policy is measured by simple regret (SR). While minimizing both CR and SR is generally a conflicting objective, previous research has shown that in stationary environments, both can be optimized in terms of the duration of the task, $T$. In practice, however, in real-world applications, human-in-the-loop decisions between tasks often results in non-stationarity. For instance, in clinical trials, scientists may adjust target health outcomes between implementations. Our results show that task non-stationarity leads to a more restrictive trade-off between CR and SR. To balance these competing goals, the algorithm must explore excessively, leading to a CR bound worse than the typical optimal rate of $T^{1/2}$. These findings are practically significant, indicating that increased exploration is necessary in non-stationary environments to accommodate task changes, impacting the design of RL algorithms in fields such as healthcare and beyond.
Related papers
- Unified Active Retrieval for Retrieval Augmented Generation [69.63003043712696]
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal.
Existing active retrieval methods face two challenges: 1.
They usually rely on a single criterion, which struggles with handling various types of instructions.
They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated.
arXiv Detail & Related papers (2024-06-18T12:09:02Z) - MOSEAC: Streamlined Variable Time Step Reinforcement Learning [14.838483990647697]
We introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method.
MOSEAC features an adaptive reward scheme based on observed trends in task rewards during training.
We validate the MOSEAC method through simulations in a Newtonian kinematics environment.
arXiv Detail & Related papers (2024-06-03T16:51:57Z) - Reduced Policy Optimization for Continuous Control with Hard Constraints [14.141467234397256]
We present a new constrained RL algorithm that combines RL with general hard constraints.
With these benchmarks, RPO achieves better performance than previous constrained RL algorithms in terms of both reward and constraint violation.
We believe RPO, along with the new benchmarks, will open up new opportunities for applying complex constraints to real-world problems.
arXiv Detail & Related papers (2023-10-14T12:55:43Z) - Task-specific experimental design for treatment effect estimation [59.879567967089145]
Large randomised trials (RCTs) are the standard for causal inference.
Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought.
We develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications.
arXiv Detail & Related papers (2023-06-08T18:10:37Z) - Human-Inspired Framework to Accelerate Reinforcement Learning [1.6317061277457001]
Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency.
This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency.
arXiv Detail & Related papers (2023-02-28T13:15:04Z) - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.
We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z) - Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR
and Worst Path [40.4378338001229]
We study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail of the reward-to-go at each step.
This formulation is applicable to real-world tasks that demand strong risk avoidance throughout the decision process.
arXiv Detail & Related papers (2022-06-06T15:24:06Z) - Self-supervised Representation Learning with Relative Predictive Coding [102.93854542031396]
Relative Predictive Coding (RPC) is a new contrastive representation learning objective.
RPC maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance.
We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks.
arXiv Detail & Related papers (2021-03-21T01:04:24Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.