Timing is Everything: Learning to Act Selectively with Costly Actions
and Budgetary Constraints
- URL: http://arxiv.org/abs/2205.15953v4
- Date: Sun, 4 Jun 2023 23:59:39 GMT
- Title: Timing is Everything: Learning to Act Selectively with Costly Actions
and Budgetary Constraints
- Authors: David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng
Dai, Kun Shao, Jun Wang
- Abstract summary: We introduce a reinforcement learning framework named textbfLearnable textbfImpulse textbfControl textbfReinforcement textbfAlgorithm (LICRA)
At the core of LICRA is a nested structure that combines RL and a form of policy known as textitimpulse control which learns to maximise objectives when actions incur costs.
We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely.
- Score: 9.132215354916784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world settings involve costs for performing actions; transaction
costs in financial systems and fuel costs being common examples. In these
settings, performing actions at each time step quickly accumulates costs
leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces
wear and tear and ultimately, damage. Determining \textit{when to act} is
crucial for achieving successful outcomes and yet, the challenge of efficiently
\textit{learning} to behave optimally when actions incur minimally bounded
costs remains unresolved. In this paper, we introduce a reinforcement learning
(RL) framework named \textbf{L}earnable \textbf{I}mpulse \textbf{C}ontrol
\textbf{R}einforcement \textbf{A}lgorithm (LICRA), for learning to optimally
select both when to act and which actions to take when actions incur costs. At
the core of LICRA is a nested structure that combines RL and a form of policy
known as \textit{impulse control} which learns to maximise objectives when
actions incur costs. We prove that LICRA, which seamlessly adopts any RL
method, converges to policies that optimally select when to perform actions and
their optimal magnitudes. We then augment LICRA to handle problems in which the
agent can perform at most $k<\infty$ actions and more generally, faces a budget
constraint. We show LICRA learns the optimal value function and ensures budget
constraints are satisfied almost surely. We demonstrate empirically LICRA's
superior performance against benchmark RL methods in OpenAI gym's \textit{Lunar
Lander} and in \textit{Highway} environments and a variant of the Merton
portfolio problem within finance.
Related papers
- CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing [56.98081258047281]
CITER enables efficient collaboration between small and large language models (SLMs & LLMs) through a token-level routing strategy.
We formulate router training as a policy optimization, where the router receives rewards based on both the quality of predictions and the inference costs of generation.
Our experiments show that CITER reduces the inference costs while preserving high-quality generation, offering a promising solution for real-time and resource-constrained applications.
arXiv Detail & Related papers (2025-02-04T03:36:44Z) - Control when confidence is costly [4.683612295430957]
We develop a version of control that accounts for computational costs of inference.
We study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state.
We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands.
arXiv Detail & Related papers (2024-06-20T15:50:38Z) - Reinforcement Learning from Human Feedback with Active Queries [59.855433734053555]
Current reinforcement learning approaches often require a large amount of human-labelled preference data.
We propose query-efficient RLHF methods inspired by the success of active learning.
Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
arXiv Detail & Related papers (2024-02-14T18:58:40Z) - Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption [60.958746600254884]
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL)
We introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE.
We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE)
arXiv Detail & Related papers (2024-02-14T07:27:30Z) - Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning.
Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem.
We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z) - Solving Richly Constrained Reinforcement Learning through State
Augmentation and Reward Penalties [8.86470998648085]
Key challenge is handling expected cost accumulated using the policy.
Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions.
We provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties.
arXiv Detail & Related papers (2023-01-27T08:33:08Z) - AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement
Learning [3.4806267677524896]
We propose AutoCost, a framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance.
We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs.
arXiv Detail & Related papers (2023-01-24T22:51:29Z) - Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement
Learning For Optimal Execution [8.021077964915996]
Reinforcement learning can help decide the order-splitting sizes.
Key challenge lies in the "continuous-discrete duality" of the action space.
We propose a hybrid RL method to combine the advantages of both.
arXiv Detail & Related papers (2022-07-22T15:50:44Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.