Related papers: Learning Fair Division from Bandit Feedback

Learning Fair Division from Bandit Feedback

URL: http://arxiv.org/abs/2311.09068v1
Date: Wed, 15 Nov 2023 16:10:34 GMT
Title: Learning Fair Division from Bandit Feedback
Authors: Hakuei Yamada, Junpei Komiyama, Kenshi Abe, Atsushi Iwasaki
Abstract summary: This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities. We introduce wrapper algorithms utilizing textitdual averaging, enabling gradual learning of both the type distribution of arriving items and agents' values through bandit feedback.
Score: 13.12913475818328
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities. Departing from conventional online algorithm, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing \textit{dual averaging}, enabling gradual learning of both the type distribution of arriving items and agents' values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We establish regret bounds in Nash social welfare and empirically validate the superior performance of our proposed algorithms across synthetic and empirical datasets.

Related papers

MAP Estimation with Denoisers: Convergence Rates and Guarantees [37.88502562012743]
We show that a simple algorithm converges to the proximal operator under a log-concavity assumption on the prior $p$.<n>We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives.
arXiv Detail & Related papers (2025-07-21T08:59:33Z)
Multi-Agent Best Arm Identification in Stochastic Linear Bandits [0.7673339435080443]
We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario. In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel. We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
arXiv Detail & Related papers (2024-11-20T20:09:44Z)
Inference for an Algorithmic Fairness-Accuracy Frontier [0.7743097066308449]
We propose a debiased machine learning estimator for the fairness-accuracy frontier.<n>We derive its distribution and propose inference methods to test key hypotheses in the fairness literature.<n>We show that our approach yields alternative algorithms that lie on the fairness-accuracy frontier, offering improvements along both dimensions.
arXiv Detail & Related papers (2024-02-14T00:56:09Z)
Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints [0.0]
We present conservative distributed multi-task learning in linear contextual bandits with heterogeneous agents. The exact context is unknown, and only a context distribution is available to the agents. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. It includes synchronized sharing of estimates among agents via a central server.
arXiv Detail & Related papers (2024-01-21T18:43:55Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts. We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z)
Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations. Recent advances accomplish this task by leveraging clustering-based pseudo labels. We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Conjugated Discrete Distributions for Distributional Reinforcement Learning [0.0]
We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. We argue that distributional reinforcement learning lends itself to remedy this situation completely.
arXiv Detail & Related papers (2021-12-14T14:14:49Z)
Asymptotics of Network Embeddings Learned via Subsampling [4.23373349945751]
We study representation methods using a subsampling approach, such as node2vec, into a single unifying framework. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks. Notably, we observe that typically used loss functions may lead to shortcomings, such as a lack of Fisher consistency.
arXiv Detail & Related papers (2021-07-06T02:54:53Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
A General Method for Robust Learning from Batches [56.59844655107251]
We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. We derive the first robust computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
arXiv Detail & Related papers (2020-02-25T18:53:25Z)
Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [86.45526827323954]
Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training. We propose an iterative algorithm to learn such pairwise relations. We show that the proposed algorithm performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2020-02-19T10:32:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.