Related papers: Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems

Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems

URL: http://arxiv.org/abs/2008.07386v1
Date: Mon, 17 Aug 2020 14:53:17 GMT
Title: Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems
Authors: Fabio Massimo Zennaro, Audun J{\o}sang
Abstract summary: We consider the formalism of subjective logic, a concise and expressive framework to express Dirichlet-multinomial models as subjective opinions. We propose new algorithms grounded in subjective logic to tackle the multi-armed bandit problem.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The multi-armed bandit problem is a classical decision-making problem where an agent has to learn an optimal action balancing exploration and exploitation. Properly managing this trade-off requires a correct assessment of uncertainty; in multi-armed bandits, as in other machine learning applications, it is important to distinguish between stochasticity that is inherent to the system (aleatoric uncertainty) and stochasticity that derives from the limited knowledge of the agent (epistemic uncertainty). In this paper we consider the formalism of subjective logic, a concise and expressive framework to express Dirichlet-multinomial models as subjective opinions, and we apply it to the problem of multi-armed bandits. We propose new algorithms grounded in subjective logic to tackle the multi-armed bandit problem, we compare them against classical algorithms from the literature, and we analyze the insights they provide in evaluating the dynamics of uncertainty. Our preliminary results suggest that subjective logic quantities enable useful assessment of uncertainty that may be exploited by more refined agents.

Related papers

Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Selective Reviews of Bandit Problems in AI via a Statistical View [1.9458156037869137]
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multiarmed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs.
arXiv Detail & Related papers (2024-12-03T08:28:47Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Concise and interpretable multi-label rule sets [13.416159628299779]
We develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules. Our method is able to find a small set of relevant patterns that lead to accurate multi-label classification.
arXiv Detail & Related papers (2022-10-04T11:23:50Z)
On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making. We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z)
Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains. We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue. We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z)
Ensemble-based Uncertainty Quantification: Bayesian versus Credal Inference [0.0]
We consider ensemble-based approaches to uncertainty quantification. We specifically focus on Bayesian methods and approaches based on so-called credal sets. The effectiveness of corresponding measures is evaluated and compared in an empirical study on classification with a reject option.
arXiv Detail & Related papers (2021-07-21T22:47:24Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
An empirical evaluation of active inference in multi-armed bandits [0.0]
The active inference framework is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off. We derive an efficient and approximate scalable active inference algorithm and compare it to two state-of-the-art bandit algorithms.
arXiv Detail & Related papers (2021-01-21T16:20:06Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
Protecting Classifiers From Attacks [0.41942958779358663]
In multiple domains such as malware detection, automated driving systems, or fraud detection, classification algorithms are susceptible to being attacked.<n>We present an alternative Bayesian decision theoretic framework that accounts for the uncertainty about the attacker's behavior.<n>Globally, we are able to robustify statistical classification algorithms against malicious attacks.
arXiv Detail & Related papers (2020-04-18T21:21:56Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.