Long-Term Fairness with Unknown Dynamics
- URL: http://arxiv.org/abs/2304.09362v2
- Date: Wed, 7 Jun 2023 20:55:04 GMT
- Title: Long-Term Fairness with Unknown Dynamics
- Authors: Tongxin Yin, Reilly Raab, Mingyan Liu, Yang Liu
- Abstract summary: We formalize long-term fairness in the context of online reinforcement learning.
We show that an algorithm can adapt to unknown dynamics by sacrificing short-term incentives.
- Score: 16.683582656377396
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While machine learning can myopically reinforce social inequalities, it may
also be used to dynamically seek equitable outcomes. In this paper, we
formalize long-term fairness in the context of online reinforcement learning.
This formulation can accommodate dynamical control objectives, such as driving
equity inherent in the state of a population, that cannot be incorporated into
static formulations of fairness. We demonstrate that this framing allows an
algorithm to adapt to unknown dynamics by sacrificing short-term incentives to
drive a classifier-population system towards more desirable equilibria. For the
proposed setting, we develop an algorithm that adapts recent work in online
learning. We prove that this algorithm achieves simultaneous probabilistic
bounds on cumulative loss and cumulative violations of fairness (as statistical
regularities between demographic groups). We compare our proposed algorithm to
the repeated retraining of myopic classifiers, as a baseline, and to a deep
reinforcement learning algorithm that lacks safety guarantees. Our experiments
model human populations according to evolutionary game theory and integrate
real-world datasets.
Related papers
- A Mathematics Framework of Artificial Shifted Population Risk and Its Further Understanding Related to Consistency Regularization [7.944280447232545]
This paper introduces a more comprehensive mathematical framework for data augmentation.
We establish that the expected risk of the shifted population is the sum of the original population risk and a gap term.
The paper also provides a theoretical understanding of this gap, highlighting its negative effects on the early stages of training.
arXiv Detail & Related papers (2025-02-15T08:26:49Z) - Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations.
This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z) - Dreaming Learning [41.94295877935867]
Introducing new information to a machine learning system can interfere with previously stored data.
We propose a training algorithm inspired by Stuart Kauffman's notion of the Adjacent Possible.
It predisposes the neural network to smoothly accept and integrate data sequences with different statistical characteristics than expected.
arXiv Detail & Related papers (2024-10-23T09:17:31Z) - Dynamic Environment Responsive Online Meta-Learning with Fairness
Awareness [30.44174123736964]
We introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML.
Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches.
arXiv Detail & Related papers (2024-02-19T17:44:35Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Drop Edges and Adapt: a Fairness Enforcing Fine-tuning for Graph Neural
Networks [9.362130313618797]
Link prediction algorithms tend to disfavor the links between individuals in specific demographic groups.
This paper proposes a novel way to enforce fairness on graph neural networks with a fine-tuning strategy.
One novelty of DEA is that we can use a discrete yet learnable adjacency matrix in our fine-tuning.
arXiv Detail & Related papers (2023-02-22T16:28:08Z) - Finite-Time Consensus Learning for Decentralized Optimization with
Nonlinear Gossiping [77.53019031244908]
We present a novel decentralized learning framework based on nonlinear gossiping (NGO), that enjoys an appealing finite-time consensus property to achieve better synchronization.
Our analysis on how communication delay and randomized chats affect learning further enables the derivation of practical variants.
arXiv Detail & Related papers (2021-11-04T15:36:25Z) - Network Classifiers Based on Social Learning [71.86764107527812]
We propose a new way of combining independently trained classifiers over space and time.
The proposed architecture is able to improve prediction performance over time with unlabeled data.
We show that this strategy results in consistent learning with high probability, and it yields a robust structure against poorly trained classifiers.
arXiv Detail & Related papers (2020-10-23T11:18:20Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.