Related papers: Hierarchical Upper Confidence Bounds for Constrained Online Learning

Hierarchical Upper Confidence Bounds for Constrained Online Learning

URL: http://arxiv.org/abs/2410.17216v2
Date: Fri, 25 Oct 2024 12:01:22 GMT
Title: Hierarchical Upper Confidence Bounds for Constrained Online Learning
Authors: Ali Baheri,
Abstract summary: We introduce the hierarchical constrained bandits (HCB) framework, which extends the contextual bandit problem to incorporate hierarchical decision structures and multi-level constraints. Our theoretical analysis establishes sublinear regret bounds for HC-UCB and provides high-probability guarantees for constraint satisfaction at all hierarchical levels.
Score: 4.8951183832371
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation. Traditional MAB formulations, however, do not adequately capture scenarios where decisions are structured hierarchically, involve multi-level constraints, or feature context-dependent action spaces. In this paper, we introduce the hierarchical constrained bandits (HCB) framework, which extends the contextual bandit problem to incorporate hierarchical decision structures and multi-level constraints. We propose the hierarchical constrained upper confidence bound (HC-UCB) algorithm, designed to address the complexities of the HCB problem by leveraging confidence bounds within a hierarchical setting. Our theoretical analysis establishes sublinear regret bounds for HC-UCB and provides high-probability guarantees for constraint satisfaction at all hierarchical levels. Furthermore, we derive a minimax lower bound on the regret for the HCB problem, demonstrating the near-optimality of our algorithm. The results are significant for real-world applications where decision-making processes are inherently hierarchical and constrained, offering a robust and efficient solution that balances exploration and exploitation across multiple levels of decision-making.

Related papers

Situational-Constrained Sequential Resources Allocation via Reinforcement Learning [17.8234166913582]
Sequential Resource Allocation with situational constraints presents a significant challenge in real-world applications.<n>This paper introduces a novel framework, SCRL, to address this problem.<n>We develop a new algorithm that dynamically penalizes constraint violations.
arXiv Detail & Related papers (2025-06-17T02:40:49Z)
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning [60.84707424369494]
Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models (LLMs) on complex tasks.<n>We introduce the Reasoning Boundary Framework++ (RBF++), a framework for evaluating and optimizing measurable boundaries of CoT capability.
arXiv Detail & Related papers (2025-05-19T16:25:55Z)
Constrained Online Decision-Making: A Unified Framework [14.465944215100746]
We investigate a general formulation of sequential decision-making with stage-wise feasibility constraints.<n>We propose a unified algorithmic framework that captures many existing constrained learning problems.<n>Our result offers a principled foundation for constrained sequential decision-making in both theory and practice.
arXiv Detail & Related papers (2025-05-11T19:22:04Z)
Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB. Off-CMAB combines pessimistic reward estimations with solvers. Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z)
Decision Making in Changing Environments: Robustness, Query-Based Learning, and Differential Privacy [59.64384863882473]
We study the problem of interactive decision making in which the underlying environment changes over time subject to given constraints. We propose a framework, which provides an complexity between the complexity and adversarial settings of decision making.
arXiv Detail & Related papers (2025-01-24T21:31:50Z)
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z)
A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints [66.61399765513383]
We develop a BLOCC algorithm to tackle BiLevel Optimization problems with Coupled Constraints. We demonstrate its effectiveness on two well-known real-world applications.
arXiv Detail & Related papers (2024-06-14T15:59:36Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z)
On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making. We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z)
Adjacency constraint for efficient hierarchical reinforcement learning [25.15808501708926]
Goal-conditioned Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. We show that this problem can be effectively alleviated by restricting the high-level action space to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2021-10-30T09:26:45Z)
Value-Function-based Sequential Minimization for Bi-level Optimization [52.39882976848064]
gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks. There are almost no gradient-based methods able to solve BLO in challenging scenarios, such as BLO with functional constraints and pessimistic BLO. We provide Bi-level Value-Function-based Sequential Minimization (BVFSM) to address the above issues.
arXiv Detail & Related papers (2021-10-11T03:13:39Z)
Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints [0.0]
Determination of inspection and maintenance policies constitutes a complex optimization problem. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL) The proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions.
arXiv Detail & Related papers (2020-07-02T20:44:07Z)
Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning [22.319208517053816]
Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques. HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2020-06-20T03:34:45Z)
Safety-guaranteed Reinforcement Learning based on Multi-class Support Vector Machine [5.578687473172938]
We address the problem of satisfying hard state constraints in a model-free RL setting with the deterministic system dynamics. The proposed algorithm is developed for the discrete state and action space and utilizes a multi-class support vector machine (SVM) to represent the policy.
arXiv Detail & Related papers (2020-06-12T19:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.