Timing is Everything: Learning to Act Selectively with Costly Actions
and Budgetary Constraints
- URL: http://arxiv.org/abs/2205.15953v4
- Date: Sun, 4 Jun 2023 23:59:39 GMT
- Title: Timing is Everything: Learning to Act Selectively with Costly Actions
and Budgetary Constraints
- Authors: David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng
Dai, Kun Shao, Jun Wang
- Abstract summary: We introduce a reinforcement learning framework named textbfLearnable textbfImpulse textbfControl textbfReinforcement textbfAlgorithm (LICRA)
At the core of LICRA is a nested structure that combines RL and a form of policy known as textitimpulse control which learns to maximise objectives when actions incur costs.
We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely.
- Score: 9.132215354916784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world settings involve costs for performing actions; transaction
costs in financial systems and fuel costs being common examples. In these
settings, performing actions at each time step quickly accumulates costs
leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces
wear and tear and ultimately, damage. Determining \textit{when to act} is
crucial for achieving successful outcomes and yet, the challenge of efficiently
\textit{learning} to behave optimally when actions incur minimally bounded
costs remains unresolved. In this paper, we introduce a reinforcement learning
(RL) framework named \textbf{L}earnable \textbf{I}mpulse \textbf{C}ontrol
\textbf{R}einforcement \textbf{A}lgorithm (LICRA), for learning to optimally
select both when to act and which actions to take when actions incur costs. At
the core of LICRA is a nested structure that combines RL and a form of policy
known as \textit{impulse control} which learns to maximise objectives when
actions incur costs. We prove that LICRA, which seamlessly adopts any RL
method, converges to policies that optimally select when to perform actions and
their optimal magnitudes. We then augment LICRA to handle problems in which the
agent can perform at most $k<\infty$ actions and more generally, faces a budget
constraint. We show LICRA learns the optimal value function and ensures budget
constraints are satisfied almost surely. We demonstrate empirically LICRA's
superior performance against benchmark RL methods in OpenAI gym's \textit{Lunar
Lander} and in \textit{Highway} environments and a variant of the Merton
portfolio problem within finance.
Related papers
- Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation.
We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z) - Control when confidence is costly [4.683612295430957]
We develop a version of control that accounts for computational costs of inference.
We study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state.
We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands.
arXiv Detail & Related papers (2024-06-20T15:50:38Z) - Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates.
Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z) - Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption [60.958746600254884]
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL)
We introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE.
We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE)
arXiv Detail & Related papers (2024-02-14T07:27:30Z) - Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning.
Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem.
We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z) - Solving Richly Constrained Reinforcement Learning through State
Augmentation and Reward Penalties [8.86470998648085]
Key challenge is handling expected cost accumulated using the policy.
Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions.
We provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties.
arXiv Detail & Related papers (2023-01-27T08:33:08Z) - AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement
Learning [3.4806267677524896]
We propose AutoCost, a framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance.
We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs.
arXiv Detail & Related papers (2023-01-24T22:51:29Z) - Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement
Learning For Optimal Execution [8.021077964915996]
Reinforcement learning can help decide the order-splitting sizes.
Key challenge lies in the "continuous-discrete duality" of the action space.
We propose a hybrid RL method to combine the advantages of both.
arXiv Detail & Related papers (2022-07-22T15:50:44Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.