An empirical evaluation of active inference in multi-armed bandits
- URL: http://arxiv.org/abs/2101.08699v2
- Date: Sat, 23 Jan 2021 13:20:34 GMT
- Title: An empirical evaluation of active inference in multi-armed bandits
- Authors: Dimitrije Markovic, Hrvoje Stojic, Sarah Schwoebel, and Stefan J.
Kiebel
- Abstract summary: The active inference framework is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off.
We derive an efficient and approximate scalable active inference algorithm and compare it to two state-of-the-art bandit algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key feature of sequential decision making under uncertainty is a need to
balance between exploiting--choosing the best action according to the current
knowledge, and exploring--obtaining information about values of other actions.
The multi-armed bandit problem, a classical task that captures this trade-off,
served as a vehicle in machine learning for developing bandit algorithms that
proved to be useful in numerous industrial applications. The active inference
framework, an approach to sequential decision making recently developed in
neuroscience for understanding human and animal behaviour, is distinguished by
its sophisticated strategy for resolving the exploration-exploitation
trade-off. This makes active inference an exciting alternative to already
established bandit algorithms. Here we derive an efficient and scalable
approximate active inference algorithm and compare it to two state-of-the-art
bandit algorithms: Bayesian upper confidence bound and optimistic Thompson
sampling. This comparison is done on two types of bandit problems: a stationary
and a dynamic switching bandit. Our empirical evaluation shows that the active
inference algorithm does not produce efficient long-term behaviour in
stationary bandits. However, in the more challenging switching bandit problem
active inference performs substantially better than the two state-of-the-art
bandit algorithms. The results open exciting venues for further research in
theoretical and applied machine learning, as well as lend additional
credibility to active inference as a general framework for studying human and
animal behaviour.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Neural Active Learning Beyond Bandits [69.99592173038903]
We study both stream-based and pool-based active learning with neural network approximations.
We propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning.
arXiv Detail & Related papers (2024-04-18T21:52:14Z) - Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits.
At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits.
Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits.
We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z) - Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields.
We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards.
We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z) - Statistical Consequences of Dueling Bandits [0.0]
Multi-Armed-Bandit frameworks have often been used to assess educational interventions.
Recent work has shown that it is more beneficial for a student to provide qualitative feedback through preference elicitation.
We compare traditional uniform sampling to a dueling bandit algorithm and find that dueling bandit algorithms perform well at cumulative regret minimisation, but lead to inflated Type-I error rates and reduced power under certain circumstances.
arXiv Detail & Related papers (2021-10-16T23:48:43Z) - Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in
Contextual Bandit Algorithms [74.55200180156906]
The contextual bandit problem models the trade-off between exploration and exploitation.
We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
arXiv Detail & Related papers (2021-06-05T22:30:21Z) - Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit
Problems [0.0]
We consider the formalism of subjective logic, a concise and expressive framework to express Dirichlet-multinomial models as subjective opinions.
We propose new algorithms grounded in subjective logic to tackle the multi-armed bandit problem.
arXiv Detail & Related papers (2020-08-17T14:53:17Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z) - Hyper-parameter Tuning for the Contextual Bandit [22.721128745617076]
We study the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting.
Our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed context.
We have presented here two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm.
arXiv Detail & Related papers (2020-05-04T17:20:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.