What If I Don't Like Any Of The Choices? The Limits of Preference
Elicitation for Participatory Algorithm Design
- URL: http://arxiv.org/abs/2007.06718v1
- Date: Mon, 13 Jul 2020 21:58:30 GMT
- Title: What If I Don't Like Any Of The Choices? The Limits of Preference
Elicitation for Participatory Algorithm Design
- Authors: Samantha Robertson and Niloufar Salehi
- Abstract summary: We argue that optimizing for individual preference satisfaction in the distribution of limited resources may actually inhibit progress towards social and distributive justice.
Individual preferences can be a useful signal but should be expanded to support more expressive and inclusive forms of democratic participation.
- Score: 12.386462516398469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emerging methods for participatory algorithm design have proposed collecting
and aggregating individual stakeholder preferences to create algorithmic
systems that account for those stakeholders' values. Using algorithmic student
assignment as a case study, we argue that optimizing for individual preference
satisfaction in the distribution of limited resources may actually inhibit
progress towards social and distributive justice. Individual preferences can be
a useful signal but should be expanded to support more expressive and inclusive
forms of democratic participation.
Related papers
- PIPA: Preference Alignment as Prior-Informed Statistical Estimation [57.24096291517857]
We introduce Pior-Informed Preference Alignment (PIPA), a unified, RL-free probabilistic framework.
PIPA accommodates both paired and unpaired data, as well as answer and step-level annotations.
By integrating different types of prior information, we developed two variations of PIPA: PIPA-M and PIPA-N.
arXiv Detail & Related papers (2025-02-09T04:31:30Z) - Social Choice for Heterogeneous Fairness in Recommendation [9.753088666705985]
Algorithmic fairness in recommender systems requires close attention to the needs of a diverse set of stakeholders.
Previous work has often been limited by fixed, single-objective definitions of fairness.
Our work approaches recommendation fairness from the standpoint of computational social choice.
arXiv Detail & Related papers (2024-10-06T17:01:18Z) - Pareto-Optimal Learning from Preferences with Hidden Context [17.590330740964266]
We propose POPL, which enables pluralistic alignment by framing discrepant group preferences as objectives with potential trade-offs.
Our theoretical and empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions and policies.
We illustrate that POPL can also serve as a foundation for techniques optimizing specific notions of group fairness.
arXiv Detail & Related papers (2024-06-21T18:57:38Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Personalized Reinforcement Learning with a Budget of Policies [9.846353643883443]
Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users.
We propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints.
In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies.
We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs.
arXiv Detail & Related papers (2024-01-12T11:27:55Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Achieving Counterfactual Fairness for Causal Bandit [18.077963117600785]
We study how to recommend an item at each step to maximize the expected reward.
We then propose the fair causal bandit (F-UCB) for achieving the counterfactual individual fairness.
arXiv Detail & Related papers (2021-09-21T23:44:48Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Fair and Useful Cohort Selection [12.319543784920304]
Dwork and Ilvento introduce an archetypal problem called fair-cohort-selection problem.
A single fair classifier is composed with itself to select a group of candidates of a given size.
We give optimal (or approximately optimal)-time algorithms for this problem in both an offline setting and an online setting.
arXiv Detail & Related papers (2020-09-04T14:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.