Policy Aggregation
- URL: http://arxiv.org/abs/2411.03651v1
- Date: Wed, 06 Nov 2024 04:19:50 GMT
- Title: Policy Aggregation
- Authors: Parand A. Alamdari, Soroush Ebadian, Ariel D. Procaccia,
- Abstract summary: We consider the challenge of AI value alignment with multiple individuals with different reward functions and optimal policies in an underlying Markov decision process.
We formalize this problem as one of policy aggregation, where the goal is to identify a desirable collective policy.
Key insight is that social choice methods can be reinterpreted by identifying ordinal preferences with volumes of subsets of the state-action occupancy polytope.
- Score: 21.21314301021803
- License:
- Abstract: We consider the challenge of AI value alignment with multiple individuals that have different reward functions and optimal policies in an underlying Markov decision process. We formalize this problem as one of policy aggregation, where the goal is to identify a desirable collective policy. We argue that an approach informed by social choice theory is especially suitable. Our key insight is that social choice methods can be reinterpreted by identifying ordinal preferences with volumes of subsets of the state-action occupancy polytope. Building on this insight, we demonstrate that a variety of methods--including approval voting, Borda count, the proportional veto core, and quantile fairness--can be practically applied to policy aggregation.
Related papers
- Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning [10.848218400641466]
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives.
We propose an approach for clustering the solution set generated by MORL.
arXiv Detail & Related papers (2024-11-07T15:26:38Z) - Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Clustered Policy Decision Ranking [6.338178373376447]
In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer.
It is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is.
We propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states.
arXiv Detail & Related papers (2023-11-21T20:16:02Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Well-being policy evaluation methodology based on WE pluralism [0.0]
This study shifts from pluralism based on objective indicators to conceptual pluralism that emphasizes subjective context.
By combining well-being and joint fact-finding on the narrow-wide WE consensus, the policy evaluation method is formulated.
arXiv Detail & Related papers (2023-05-08T06:51:43Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Identification of Subgroups With Similar Benefits in Off-Policy Policy
Evaluation [60.71312668265873]
We develop a method to balance the need for personalization with confident predictions.
We show that our method can be used to form accurate predictions of heterogeneous treatment effects.
arXiv Detail & Related papers (2021-11-28T23:19:12Z) - Fair Set Selection: Meritocracy and Social Welfare [6.205308371824033]
We formulate the problem of selecting a set of individuals from a candidate population as a utility maximisation problem.
From the decision maker's perspective, it is equivalent to finding a selection policy that maximises expected utility.
Our framework leads to the notion of expected marginal contribution (EMC) of an individual with respect to a selection policy as a measure of deviation from meritocracy.
arXiv Detail & Related papers (2021-02-23T20:36:36Z) - Offline Policy Selection under Uncertainty [113.57441913299868]
We consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset.
Access to the full distribution over one's belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics.
We show how BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric.
arXiv Detail & Related papers (2020-12-12T23:09:21Z) - Robust Batch Policy Learning in Markov Decision Processes [0.0]
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP)
We propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution.
arXiv Detail & Related papers (2020-11-09T04:41:21Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.