Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
- URL: http://arxiv.org/abs/2502.11828v1
- Date: Mon, 17 Feb 2025 14:25:33 GMT
- Title: Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
- Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell,
- Abstract summary: In many real-world settings, it is important to optimize over multiple objectives simultaneously.
We consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function.
We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large.
- Score: 16.400288624027375
- License:
- Abstract: In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.
Related papers
- Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model.
It suffers from the difficulty to formalize and conduct the exact learning process.
We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z) - Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance [43.44913206006581]
Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives.
This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems.
arXiv Detail & Related papers (2024-11-27T10:16:25Z) - Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards [43.816375964005026]
Linear temporal logic (LTL) and, more generally, $omega$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning.
We show that each RL problem for $omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion.
arXiv Detail & Related papers (2024-10-16T02:42:37Z) - VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - The Perfect Blend: Redefining RLHF with Mixture of Judges [68.58426626501883]
Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM)
Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations.
We introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO)
arXiv Detail & Related papers (2024-09-30T15:06:53Z) - Common pitfalls to avoid while using multiobjective optimization in machine learning [1.2499537119440245]
There has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML)
Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO.
We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example) and identify misconceptions that highlight the need for a better grasp of MOO principles in ML.
arXiv Detail & Related papers (2024-05-02T17:12:25Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Rewarded soups: towards Pareto-optimal alignment by interpolating
weights fine-tuned on diverse rewards [101.7246658985579]
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data.
We propose embracing the heterogeneity of diverse rewards by following a multi-policy strategy.
We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks.
arXiv Detail & Related papers (2023-06-07T14:58:15Z) - Probably Approximately Correct Federated Learning [20.85915650297227]
Federated learning (FL) is a new distributed learning paradigm with privacy, utility, and efficiency as its primary pillars.
Existing research indicates that it is unlikely to simultaneously attain infinitesimal privacy leakage, utility loss, and efficiency.
How to find an optimal trade-off solution is the key consideration when designing the FL algorithm.
arXiv Detail & Related papers (2023-04-10T15:12:34Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Learning What to Defer for Maximum Independent Sets [84.00112106334655]
We propose a novel DRL scheme, coined learning what to defer (LwD), where the agent adaptively shrinks or stretch the number of stages by learning to distribute the element-wise decisions of the solution at each stage.
We apply the proposed framework to the maximum independent set (MIS) problem, and demonstrate its significant improvement over the current state-of-the-art DRL scheme.
arXiv Detail & Related papers (2020-06-17T02:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.