Reinforcement Learning with Stepwise Fairness Constraints
- URL: http://arxiv.org/abs/2211.03994v1
- Date: Tue, 8 Nov 2022 04:06:23 GMT
- Title: Reinforcement Learning with Stepwise Fairness Constraints
- Authors: Zhun Deng, He Sun, Zhiwei Steven Wu, Linjun Zhang, David C. Parkes
- Abstract summary: We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
- Score: 50.538878453547966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI methods are used in societally important settings, ranging from credit to
employment to housing, and it is crucial to provide fairness in regard to
algorithmic decision making. Moreover, many settings are dynamic, with
populations responding to sequential decision policies. We introduce the study
of reinforcement learning (RL) with stepwise fairness constraints, requiring
group fairness at each time step. Our focus is on tabular episodic RL, and we
provide learning algorithms with strong theoretical guarantees in regard to
policy optimality and fairness violation. Our framework provides useful tools
to study the impact of fairness constraints in sequential settings and brings
up new challenges in RL.
Related papers
- Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching [0.0]
Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework.
We propose a novel framework that relies on switching between pure learning (reward) and constraint satisfaction.
arXiv Detail & Related papers (2024-10-10T15:19:45Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Constrained Reinforcement Learning Under Model Mismatch [18.05296241839688]
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment.
However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments.
We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training.
arXiv Detail & Related papers (2024-05-02T14:31:52Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Evolving Constrained Reinforcement Learning Policy [5.4444944707433525]
We propose a novel evolutionary constrained reinforcement learning algorithm, which adaptively balances the reward and constraint violation with ranking.
Experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2023-04-19T03:54:31Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning [43.562783189118]
We introduce a practical algorithm for incorporating human insight to speed learning.
Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy.
In all cases, CSRL learns a good policy faster than baselines.
arXiv Detail & Related papers (2021-12-30T22:02:42Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.