Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
- URL: http://arxiv.org/abs/2502.11828v1
- Date: Mon, 17 Feb 2025 14:25:33 GMT
- Title: Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
- Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell,
- Abstract summary: In many real-world settings, it is important to optimize over multiple objectives simultaneously.<n>We consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function.<n>We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large.
- Score: 16.400288624027375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.
Related papers
- Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF [13.612504157832708]
Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model with human preferences.
In this work, we transform the non-linear aggregation problem into a series of sub-problems and extend our framework to handle multi-group scenarios.
We demonstrate that our algorithmic framework achieves sublinear regret and can be easily adapted to a reward-free algorithm.
arXiv Detail & Related papers (2025-02-21T01:56:52Z) - Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model.<n>It suffers from the difficulty to formalize and conduct the exact learning process.<n>We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z) - Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance [43.44913206006581]
Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives.<n>This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems.
arXiv Detail & Related papers (2024-11-27T10:16:25Z) - Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards [43.816375964005026]
Linear temporal logic (LTL) and, more generally, $omega$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning.
We show that each RL problem for $omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion.
arXiv Detail & Related papers (2024-10-16T02:42:37Z) - VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - The Perfect Blend: Redefining RLHF with Mixture of Judges [68.58426626501883]
Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM)
Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations.
We introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO)
arXiv Detail & Related papers (2024-09-30T15:06:53Z) - Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning [17.245293915129942]
The optimal objective is a fundamental aspect of reinforcement learning (RL)
While total return is ideal, discounted return is practical objective due to its stability.
We propose two alternative approaches to align the objectives.
arXiv Detail & Related papers (2024-07-18T08:33:10Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization [45.410121761165634]
RL-based techniques can be employed to search for prompts that, when fed into a target language model, maximize a set of user-specified reward functions.
Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards.
arXiv Detail & Related papers (2024-02-18T21:25:09Z) - Rewarded soups: towards Pareto-optimal alignment by interpolating
weights fine-tuned on diverse rewards [101.7246658985579]
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data.
We propose embracing the heterogeneity of diverse rewards by following a multi-policy strategy.
We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks.
arXiv Detail & Related papers (2023-06-07T14:58:15Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - Multi-Objective GFlowNets [59.16787189214784]
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization.
In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives.
We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse optimal solutions, based on GFlowNets.
arXiv Detail & Related papers (2022-10-23T16:15:36Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Learning What to Defer for Maximum Independent Sets [84.00112106334655]
We propose a novel DRL scheme, coined learning what to defer (LwD), where the agent adaptively shrinks or stretch the number of stages by learning to distribute the element-wise decisions of the solution at each stage.
We apply the proposed framework to the maximum independent set (MIS) problem, and demonstrate its significant improvement over the current state-of-the-art DRL scheme.
arXiv Detail & Related papers (2020-06-17T02:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.