Welfare and Fairness in Multi-objective Reinforcement Learning
- URL: http://arxiv.org/abs/2212.01382v5
- Date: Mon, 13 Nov 2023 02:35:18 GMT
- Title: Welfare and Fairness in Multi-objective Reinforcement Learning
- Authors: Zimeng Fan, Nianli Peng, Muhang Tian, and Brandon Fain
- Abstract summary: We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions.
We show that our algorithm is provably convergent, and we demonstrate experimentally that our approach outperforms techniques based on linear scalarization.
- Score: 1.5763562007908967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study fair multi-objective reinforcement learning in which an agent must
learn a policy that simultaneously achieves high reward on multiple dimensions
of a vector-valued reward. Motivated by the fair resource allocation
literature, we model this as an expected welfare maximization problem, for some
nonlinear fair welfare function of the vector of long-term cumulative rewards.
One canonical example of such a function is the Nash Social Welfare, or
geometric mean, the log transform of which is also known as the Proportional
Fairness objective. We show that even approximately optimal optimization of the
expected Nash Social Welfare is computationally intractable even in the tabular
case. Nevertheless, we provide a novel adaptation of Q-learning that combines
nonlinear scalarized learning updates and non-stationary action selection to
learn effective policies for optimizing nonlinear welfare functions. We show
that our algorithm is provably convergent, and we demonstrate experimentally
that our approach outperforms techniques based on linear scalarization,
mixtures of optimal linear scalarizations, or stationary action selection for
the Nash Social Welfare Objective.
Related papers
- A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback [6.578074497549894]
Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning.
This paper introduces a novel linear programming (LP) framework tailored for offline reward learning.
arXiv Detail & Related papers (2024-05-20T23:59:26Z) - Non-linear Welfare-Aware Strategic Learning [10.448052192725168]
This paper studies algorithmic decision-making in the presence of strategic individual behaviors.
We first generalize the agent best response model in previous works to the non-linear setting.
We show the three welfare can attain the optimum simultaneously only under restrictive conditions.
arXiv Detail & Related papers (2024-05-03T01:50:03Z) - Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return [1.3162012586770577]
We study multi-objective reinforcement learning with nonlinear preferences over trajectories.
We derive an extended form of Bellman optimality for nonlinear optimization.
We show that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.
arXiv Detail & Related papers (2023-11-05T02:11:07Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Achieving Fairness in Multi-Agent Markov Decision Processes Using
Reinforcement Learning [30.605881670761853]
We propose a Reinforcement Learning approach to achieve fairness in finite-horizon episodic MDPs.
We show that such an approach achieves sub-linear regret in terms of the number of episodes.
arXiv Detail & Related papers (2023-06-01T03:43:53Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Specification-Guided Learning of Nash Equilibria with High Social
Welfare [21.573746897846114]
We propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium.
We show that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.
arXiv Detail & Related papers (2022-06-06T16:06:31Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.