Related papers: Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

URL: http://arxiv.org/abs/2205.15953v4
Date: Sun, 4 Jun 2023 23:59:39 GMT
Title: Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints
Authors: David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng Dai, Kun Shao, Jun Wang
Abstract summary: We introduce a reinforcement learning framework named textbfLearnable textbfImpulse textbfControl textbfReinforcement textbfAlgorithm (LICRA) At the core of LICRA is a nested structure that combines RL and a form of policy known as textitimpulse control which learns to maximise objectives when actions incur costs. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely.
Score: 9.132215354916784
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving successful outcomes and yet, the challenge of efficiently \textit{learning} to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named \textbf{L}earnable \textbf{I}mpulse \textbf{C}ontrol \textbf{R}einforcement \textbf{A}lgorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as \textit{impulse control} which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most $k<\infty$ actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's \textit{Lunar Lander} and in \textit{Highway} environments and a variant of the Merton portfolio problem within finance.

Related papers

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing [56.98081258047281]
Collaborative Inference with Token-lEvel Routing (CITER) is a framework that enables efficient collaboration between small and large language models. We formulate router training as a policy optimization, where the router receives rewards based on both the quality of predictions and the inference costs of generation. Our experiments show that CITER reduces the inference costs while preserving high-quality generation, offering a promising solution for real-time and resource-constrained applications.
arXiv Detail & Related papers (2025-02-04T03:36:44Z)
Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z)
Control when confidence is costly [4.683612295430957]
We develop a version of control that accounts for computational costs of inference. We study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state. We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands.
arXiv Detail & Related papers (2024-06-20T15:50:38Z)
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z)
Reinforcement Learning from Human Feedback with Active Queries [59.855433734053555]
Current reinforcement learning approaches often require a large amount of human-labelled preference data. We propose query-efficient RLHF methods inspired by the success of active learning. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
arXiv Detail & Related papers (2024-02-14T18:58:40Z)
Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption [60.958746600254884]
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL) We introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE. We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE)
arXiv Detail & Related papers (2024-02-14T07:27:30Z)
Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning. Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem. We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z)
Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties [8.86470998648085]
Key challenge is handling expected cost accumulated using the policy. Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions. We provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties.
arXiv Detail & Related papers (2023-01-27T08:33:08Z)
AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning [3.4806267677524896]
We propose AutoCost, a framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs.
arXiv Detail & Related papers (2023-01-24T22:51:29Z)
Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution [8.021077964915996]
Reinforcement learning can help decide the order-splitting sizes. Key challenge lies in the "continuous-discrete duality" of the action space. We propose a hybrid RL method to combine the advantages of both.
arXiv Detail & Related papers (2022-07-22T15:50:44Z)
Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems. In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning. Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations. A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.