Conditional Policy Generator for Dynamic Constraint Satisfaction and Optimization
- URL: http://arxiv.org/abs/2509.17205v1
- Date: Sun, 21 Sep 2025 19:19:06 GMT
- Title: Conditional Policy Generator for Dynamic Constraint Satisfaction and Optimization
- Authors: Wook Lee, Frans A. Oliehoek,
- Abstract summary: We present a new approach to constraint satisfaction and optimization in dynamically changing environments.<n>We frame it as a reinforcement learning problem and introduce a conditional policy generator by borrowing the idea of class conditional generative adversarial networks (GANs)<n>We empirically demonstrate a proof-of-principle experiment with a multi-modal constraint satisfaction problem and compare between unconditional and conditional cases.
- Score: 8.466660421475295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leveraging machine learning methods to solve constraint satisfaction problems has shown promising, but they are mostly limited to a static situation where the problem description is completely known and fixed from the beginning. In this work we present a new approach to constraint satisfaction and optimization in dynamically changing environments, particularly when variables in the problem are statistically independent. We frame it as a reinforcement learning problem and introduce a conditional policy generator by borrowing the idea of class conditional generative adversarial networks (GANs). Assuming that the problem includes both static and dynamic constraints, the former are used in a reward formulation to guide the policy training such that it learns to map to a probabilistic distribution of solutions satisfying static constraints from a noise prior, which is similar to a generator in GANs. On the other hand, dynamic constraints in the problem are encoded to different class labels and fed with the input noise. The policy is then simultaneously updated for maximum likelihood of correctly classifying given the dynamic conditions in a supervised manner. We empirically demonstrate a proof-of-principle experiment with a multi-modal constraint satisfaction problem and compare between unconditional and conditional cases.
Related papers
- Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z) - Steerable Adversarial Scenario Generation through Test-Time Preference Alignment [58.37104890690234]
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems.<n>We introduce a new framework named textbfSteerable textbfAdversarial scenario textbfGEnerator (SAGE)<n>SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining.
arXiv Detail & Related papers (2025-09-24T13:27:35Z) - Algorithmic Fairness: A Runtime Perspective [6.409194734638881]
This paper proposes a framework for analysing fairness as a runtime property.<n>We study the problems of monitoring and enforcing fairness expressed in either toss outcomes or coin biases.
arXiv Detail & Related papers (2025-07-28T11:04:17Z) - Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay [8.361428709513476]
This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration.<n>The effectiveness of the proposed method is validated in two Mujoco environments.
arXiv Detail & Related papers (2024-07-23T14:00:18Z) - Generative Modelling of Stochastic Actions with Arbitrary Constraints in
Reinforcement Learning [25.342811509665097]
Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces.
A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large.
In this work, we address these challenges by applying a (state) conditional normalizing flow to compactly represent the policy.
arXiv Detail & Related papers (2023-11-26T15:57:20Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Optimizing Chance-Constrained Submodular Problems with Variable
Uncertainties [12.095075636344536]
We study chance-constrained submodular optimization problems, which capture a wide range of problems with constraints.
We present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution.
arXiv Detail & Related papers (2023-09-23T04:48:49Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Maximum Causal Entropy Inverse Constrained Reinforcement Learning [3.409089945290584]
We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy.
We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations.
Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
arXiv Detail & Related papers (2023-05-04T14:18:19Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.