Active Inference for Autonomous Decision-Making with Contextual
Multi-Armed Bandits
- URL: http://arxiv.org/abs/2209.09185v1
- Date: Mon, 19 Sep 2022 17:11:21 GMT
- Title: Active Inference for Autonomous Decision-Making with Contextual
Multi-Armed Bandits
- Authors: Shohei Wakayama and Nisar Ahmed
- Abstract summary: In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered.
In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs.
- Score: 1.3670071336891754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous robotic decision-making under uncertainty, the tradeoff between
exploitation and exploration of available options must be considered. If
secondary information associated with options can be utilized, such
decision-making problems can often be formulated as a contextual multi-armed
bandits (CMABs). In this study, we apply active inference, which has been
actively studied in the field of neuroscience in recent years, as an
alternative action selection strategy for CMABs. Unlike conventional action
selection strategies, it is possible to rigorously evaluate the uncertainty of
each option when calculating the expected free energy (EFE) associated with the
decision agent's probabilistic model, as derived from the free-energy
principle. We specifically address the case where a categorical observation
likelihood function is used, such that EFE values are analytically intractable.
We introduce new approximation methods for computing the EFE based on
variational and Laplace approximations. Extensive simulation study results
demonstrate that, compared to other strategies, active inference generally
requires far fewer iterations to identify optimal options and generally
achieves superior cumulative regret, for relatively low extra computational
cost.
Related papers
- An Efficient Approach for Solving Expensive Constrained Multiobjective Optimization Problems [0.0]
An efficient probabilistic selection based constrained multi-objective EA is proposed, referred to as PSCMOEA.
It comprises novel elements such as (a) an adaptive search bound identification scheme based on the feasibility and convergence status of evaluated solutions.
Numerical experiments are conducted on an extensive range of challenging constrained problems using low evaluation budgets to simulate ECMOPs.
arXiv Detail & Related papers (2024-05-22T02:32:58Z) - Globally-Optimal Greedy Experiment Selection for Active Sequential
Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data.
The goal is to design experiment selection rules for more accurate model estimation.
We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z) - Observation-Augmented Contextual Multi-Armed Bandits for Robotic
Exploration with Uncertain Semantic Data [7.795929277007235]
We introduce a new variant of contextual multi-armed bandits called observation-augmented CMABs (OA-CMABs)
OA-CMABs model the expected option outcomes as a function of context features and hidden parameters.
We propose a robust Bayesian inference process for OA-CMABs that is based on the concept of probabilistic data validation.
arXiv Detail & Related papers (2023-12-19T20:28:42Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Continuous Mean-Covariance Bandits [39.820490484375156]
We propose a novel Continuous Mean-Covariance Bandit model to take into account option correlation.
In CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions.
We propose novel algorithms with optimal regrets (within logarithmic factors) and provide matching lower bounds to validate their optimalities.
arXiv Detail & Related papers (2021-02-24T06:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.