Related papers: Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

URL: http://arxiv.org/abs/2502.05537v1
Date: Sat, 08 Feb 2025 12:00:30 GMT
Title: Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning
Authors: Xinsong Feng, Zihan Yu, Yanhai Xiong, Haipeng Chen,
Abstract summary: We propose a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer.<n> Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods.
Score: 5.57541853212632
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has emerged as a promising tool for combinatorial optimization (CO) problems due to its ability to learn fast, effective, and generalizable solutions. Nonetheless, existing works mostly focus on one-shot deterministic CO, while sequential stochastic CO (SSCO) has rarely been studied despite its broad applications such as adaptive influence maximization (IM) and infectious disease intervention. In this paper, we study the SSCO problem where we first decide the budget (e.g., number of seed nodes in adaptive IM) allocation for all time steps, and then select a set of nodes for each time step. The few existing studies on SSCO simplify the problems by assuming a uniformly distributed budget allocation over the time horizon, yielding suboptimal solutions. We propose a generic hierarchical RL (HRL) framework called wake-sleep option (WS-option), a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer. WS-option starts with a coherent formulation of the two-layer Markov decision processes (MDPs), capturing the interdependencies between the two layers of decisions. Building on this, WS-option employs several innovative designs to balance the model's training stability and computational efficiency, preventing the vicious cyclic interference issue between the two layers. Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources.

Related papers

Majorization-Minimization Dual Stagewise Algorithm for Generalized Lasso [2.1066879371176395]
We propose a majorization-minimization dual stagewise (MM-DUST) algorithm to efficiently trace out the full solution paths of the generalized lasso problem.<n>We analyze the computational complexity of MM-DUST and establish the uniform convergence of the approximated solution paths.
arXiv Detail & Related papers (2025-01-04T05:20:26Z)
Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA) Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z)
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems [8.871356150316224]
Two-stage neural methods motivated by divide-and-conquer strategies have shown efficiency in addressing large-scale CO problems. This article develops a unified neural divide-and-conquer framework (i.e., UDC) for solving general large-scale CO problems.
arXiv Detail & Related papers (2024-06-29T04:29:03Z)
DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems [37.205311971072405]
DISCO is an efficient DIffusion solver for large-scale Combinatorial Optimization problems. It constrains the sampling space to a more meaningful domain guided by solution residues, while preserving the multi-modal properties of the output distributions. It delivers strong performance on large-scale Traveling Salesman Problems and challenging Maximal Independent Set benchmarks, with inference time up to 5.28 times faster than other diffusion alternatives.
arXiv Detail & Related papers (2024-06-28T07:36:31Z)
Decision-focused Graph Neural Networks for Combinatorial Optimization [62.34623670845006]
An emerging strategy to tackle optimization problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms. Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework. We introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support.
arXiv Detail & Related papers (2024-06-05T22:52:27Z)
Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
A Two-stage Framework and Reinforcement Learning-based Optimization Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Simplified Swarm Optimization for Bi-Objection Active Reliability Redundancy Allocation Problems [1.5990720051907859]
The reliability redundancy allocation problem (RRAP) is a well-known problem in system design, development, and management. In this study, a bi-objective RRAP is formulated by changing the cost constraint as a new goal. To solve the proposed problem, a new simplified swarm optimization (SSO) with a penalty function, a real one-type solution structure, a number-based self-adaptive new update mechanism, a constrained non-dominated solution selection, and a new pBest replacement policy is developed.
arXiv Detail & Related papers (2020-06-17T13:15:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.