Blind Exploration and Exploitation of Stochastic Experts
- URL: http://arxiv.org/abs/2104.01078v1
- Date: Fri, 2 Apr 2021 15:02:02 GMT
- Title: Blind Exploration and Exploitation of Stochastic Experts
- Authors: Noyan C. Sevuktekin and Andrew C. Singer
- Abstract summary: We present blind exploration and exploitation (BEE) algorithms for identifying the most reliable expert based on formulations that employ posterior sampling, upper-confidence bounds, empirical Kullback-Leibler divergence, and minmax methods for the multi-armed bandit problem.
We propose an empirically realizable measure of expert competence that can be instantaneously using only the opinions of other experts.
- Score: 7.106986689736826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present blind exploration and exploitation (BEE) algorithms for
identifying the most reliable stochastic expert based on formulations that
employ posterior sampling, upper-confidence bounds, empirical Kullback-Leibler
divergence, and minmax methods for the stochastic multi-armed bandit problem.
Joint sampling and consultation of experts whose opinions depend on the hidden
and random state of the world becomes challenging in the unsupervised, or
blind, framework as feedback from the true state is not available. We propose
an empirically realizable measure of expert competence that can be inferred
instantaneously using only the opinions of other experts. This measure
preserves the ordering of true competences and thus enables joint sampling and
consultation of stochastic experts based on their opinions on dynamically
changing tasks. Statistics derived from the proposed measure is instantaneously
available allowing both blind exploration-exploitation and unsupervised opinion
aggregation. We discuss how the lack of supervision affects the asymptotic
regret of BEE architectures that rely on UCB1, KL-UCB, MOSS, IMED, and Thompson
sampling. We demonstrate the performance of different BEE algorithms
empirically and compare them to their standard, or supervised, counterparts.
Related papers
- A Bayesian Solution To The Imitation Gap [34.16107600758348]
An agent must learn to act in environments where no reward signal can be specified.
In some cases, differences in observability between the expert and the agent can give rise to an imitation gap.
arXiv Detail & Related papers (2024-06-29T17:13:37Z) - Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts [78.3687645289918]
We show that the sigmoid gating function enjoys a higher sample efficiency than the softmax gating for the statistical task of expert estimation.
We find that experts formulated as feed-forward networks with commonly used activation such as ReLU and GELU enjoy faster convergence rates under the sigmoid gating.
arXiv Detail & Related papers (2024-05-22T21:12:34Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Learning to Defer to Multiple Experts: Consistent Surrogate Losses,
Confidence Calibration, and Conformal Ensembles [0.966840768820136]
We study the statistical properties of learning to defer (L2D) to multiple experts.
We address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts.
arXiv Detail & Related papers (2022-10-30T21:27:29Z) - Deconfounding Legal Judgment Prediction for European Court of Human
Rights Cases Towards Better Alignment with Experts [1.252149409594807]
This work demonstrates that Legal Judgement Prediction systems without expert-informed adjustments can be vulnerable to shallow, distracting surface signals.
To mitigate this, we use domain expertise to strategically identify statistically predictive but legally irrelevant information.
arXiv Detail & Related papers (2022-10-25T08:37:25Z) - Trustworthy Long-Tailed Classification [41.45744960383575]
We propose a Trustworthy Long-tailed Classification (TLC) method to jointly conduct classification and uncertainty estimation.
Our TLC obtains the evidence-based uncertainty (EvU) and evidence for each expert, and then combines these uncertainties and evidences under the Dempster-Shafer Evidence Theory (DST)
The experimental results show that the proposed TLC outperforms the state-of-the-art methods and is trustworthy with reliable uncertainty.
arXiv Detail & Related papers (2021-11-17T10:52:36Z) - Are You Smarter Than a Random Expert? The Robust Aggregation of
Substitutable Signals [14.03122229316614]
This paper initiates the study of forecast aggregation in a context where experts' knowledge is chosen adversarially from a broad class of information structures.
Under the projective substitutes condition, taking the average of the experts' forecasts improves substantially upon the strategy of trusting a random expert.
We show that by averaging the experts' forecasts and then emphextremizing the average by moving it away from the prior by a constant factor, the aggregator's performance guarantee is substantially better than is possible without knowledge of the prior.
arXiv Detail & Related papers (2021-11-04T20:50:30Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - Prediction with Corrupted Expert Advice [67.67399390910381]
We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in a benign environment.
Our results reveal a surprising disparity between the often comparable Follow the Regularized Leader (FTRL) and Online Mirror Descent (OMD) frameworks.
arXiv Detail & Related papers (2020-02-24T14:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.