Distributional Multi-Objective Decision Making
- URL: http://arxiv.org/abs/2305.05560v3
- Date: Tue, 18 Jul 2023 10:59:46 GMT
- Title: Distributional Multi-Objective Decision Making
- Authors: Willem R\"opke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann
Now\'e and Diederik M. Roijers
- Abstract summary: We take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly.
We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set.
- Score: 2.185694185279913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For effective decision support in scenarios with conflicting objectives, sets
of potentially optimal solutions can be presented to the decision maker. We
explore both what policies these sets should contain and how such sets can be
computed efficiently. With this in mind, we take a distributional approach and
introduce a novel dominance criterion relating return distributions of policies
directly. Based on this criterion, we present the distributional undominated
set and show that it contains optimal policies otherwise ignored by the Pareto
front. In addition, we propose the convex distributional undominated set and
prove that it comprises all policies that maximise expected utility for
multivariate risk-averse decision makers. We propose a novel algorithm to learn
the distributional undominated set and further contribute pruning operators to
reduce the set to the convex distributional undominated set. Through
experiments, we demonstrate the feasibility and effectiveness of these methods,
making this a valuable new approach for decision support in real-world
problems.
Related papers
- Exploration-free Algorithms for Multi-group Mean Estimation [7.480522058240762]
We address the problem of multi-group mean estimation, which seeks to allocate a finite sampling budget across multiple groups to obtain uniformly accurate estimates of their means.<n>Unlike classical multi-armed bandits, whose objective is to minimize regret by identifying and exploiting the best arm, the optimal allocation in this setting requires sampling every group on the order of $Theta(T)$ times.
arXiv Detail & Related papers (2025-10-12T00:20:30Z) - Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids [37.79354987519793]
We study the problem of generalizing an expert agent's behavior, provided through demonstrations, to new environments and/or additional constraints.<n>We propose a novel, principled criterion that selects the "average" policy among those induced by the rewards in a certain bounded subset of the feasible set.
arXiv Detail & Related papers (2025-09-15T14:53:54Z) - Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence [0.562479170374811]
We introduce a novel problem setting in bandit optimization that addresses maximizing expected reward and minimizing associated uncertainty.<n>We propose a unified meta-budgetalgorithmic framework capable of operating under both fixed-confidence and fixed-optimal regimes.<n>Our approach outperforms existing methods in terms of both accuracy and sample efficiency.
arXiv Detail & Related papers (2025-06-27T14:21:03Z) - Fair Resource Allocation in Weakly Coupled Markov Decision Processes [3.824858358548714]
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes.
We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective.
arXiv Detail & Related papers (2024-11-14T20:40:55Z) - Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Learning Fair Policies for Multi-stage Selection Problems from
Observational Data [4.282745020665833]
We consider the problem of learning fair policies for multi-stage selection problems from observational data.
This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes are only observed for those selected.
We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity.
arXiv Detail & Related papers (2023-12-20T16:33:15Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Statistical Inference Under Constrained Selection Bias [20.862583584531322]
We propose a framework that enables statistical inference in the presence of selection bias.
The output is high-probability bounds on the value of an estimand for the target distribution.
We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks.
arXiv Detail & Related papers (2023-06-05T23:05:26Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Bi-objective Ranking and Selection Using Stochastic Kriging [0.0]
We consider bi-objective ranking and selection problems in which the two objective outcomes have been observed with uncertainty.
We propose a novel Bayesian bi-objective ranking and selection method that sequentially allocates extra samples to competitive solutions.
Experimental results show that the proposed method outperforms the standard allocation method, as well as a well-known state-of-the-art algorithm.
arXiv Detail & Related papers (2022-09-05T23:51:07Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Decisions, Counterfactual Explanations and Strategic Behavior [16.980621769406923]
We find policies and counterfactual explanations that are optimal in terms of utility in a strategic setting.
We show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard.
We demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations.
arXiv Detail & Related papers (2020-02-11T12:04:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.