Expert Selection in High-Dimensional Markov Decision Processes
- URL: http://arxiv.org/abs/2010.15599v1
- Date: Mon, 26 Oct 2020 03:57:25 GMT
- Title: Expert Selection in High-Dimensional Markov Decision Processes
- Authors: Vicenc Rubies-Royo, Eric Mazumdar, Roy Dong, Claire Tomlin, and S.
Shankar Sastry
- Abstract summary: Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert.
This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.
- Score: 5.52481973699219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we present a multi-armed bandit framework for online expert
selection in Markov decision processes and demonstrate its use in
high-dimensional settings. Our method takes a set of candidate expert policies
and switches between them to rapidly identify the best performing expert using
a variant of the classical upper confidence bound algorithm, thus ensuring low
regret in the overall performance of the system. This is useful in applications
where several expert policies may be available, and one needs to be selected at
run-time for the underlying environment.
Related papers
- Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes [42.22085862132403]
We introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a bilevel decision-making model.
BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs.
We propose a Hyper Policy Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence.
arXiv Detail & Related papers (2024-06-03T17:54:39Z) - Human-Algorithm Collaborative Bayesian Optimization for Engineering Systems [0.0]
We re-introduce the human back into the data-driven decision making loop by outlining an approach for collaborative Bayesian optimization.
Our methodology exploits the hypothesis that humans are more efficient at making discrete choices rather than continuous ones.
We demonstrate our approach across a number of applied and numerical case studies including bioprocess optimization and reactor geometry design.
arXiv Detail & Related papers (2024-04-16T23:17:04Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - Expert-guided Bayesian Optimisation for Human-in-the-loop Experimental
Design of Known Systems [0.0]
We apply high- throughput (batch) Bayesian optimisation alongside anthropological decision theory to enable domain experts to influence the selection of optimal experiments.
Our methodology exploits the hypothesis that humans are better at making discrete choices than continuous ones and enables experts to influence critical early decisions.
arXiv Detail & Related papers (2023-12-05T16:09:31Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Improving Recommendation System Serendipity Through Lexicase Selection [53.57498970940369]
We propose a new serendipity metric to measure the presence of echo chambers and homophily in recommendation systems.
We then attempt to improve the diversity-preservation qualities of well known recommendation techniques by adopting a parent selection algorithm known as lexicase selection.
Our results show that lexicase selection, or a mixture of lexicase selection and ranking, outperforms its purely ranked counterparts in terms of personalization, coverage and our specifically designed serendipity benchmark.
arXiv Detail & Related papers (2023-05-18T15:37:38Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Dealing with Expert Bias in Collective Decision-Making [4.588028371034406]
We propose a new algorithmic approach based on contextual multi-armed bandit problems (CMAB) to identify and counteract biased expertises.
Our novel CMAB-inspired approach achieves a higher final performance and does so while converging more rapidly than previous adaptive algorithms.
arXiv Detail & Related papers (2021-06-25T10:17:37Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.