Related papers: Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

URL: http://arxiv.org/abs/2209.09185v1
Date: Mon, 19 Sep 2022 17:11:21 GMT
Title: Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits
Authors: Shohei Wakayama and Nisar Ahmed
Abstract summary: In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered. In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs.
Score: 1.3670071336891754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered. If secondary information associated with options can be utilized, such decision-making problems can often be formulated as a contextual multi-armed bandits (CMABs). In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs. Unlike conventional action selection strategies, it is possible to rigorously evaluate the uncertainty of each option when calculating the expected free energy (EFE) associated with the decision agent's probabilistic model, as derived from the free-energy principle. We specifically address the case where a categorical observation likelihood function is used, such that EFE values are analytically intractable. We introduce new approximation methods for computing the EFE based on variational and Laplace approximations. Extensive simulation study results demonstrate that, compared to other strategies, active inference generally requires far fewer iterations to identify optimal options and generally achieves superior cumulative regret, for relatively low extra computational cost.

Related papers

An Efficient Approach for Solving Expensive Constrained Multiobjective Optimization Problems [0.0]
An efficient probabilistic selection based constrained multi-objective EA is proposed, referred to as PSCMOEA. It comprises novel elements such as (a) an adaptive search bound identification scheme based on the feasibility and convergence status of evaluated solutions. Numerical experiments are conducted on an extensive range of challenging constrained problems using low evaluation budgets to simulate ECMOPs.
arXiv Detail & Related papers (2024-05-22T02:32:58Z)
Globally-Optimal Greedy Experiment Selection for Active Sequential Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data. The goal is to design experiment selection rules for more accurate model estimation. We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z)
Observation-Augmented Contextual Multi-Armed Bandits for Robotic Exploration with Uncertain Semantic Data [7.795929277007235]
We introduce a new variant of contextual multi-armed bandits called observation-augmented CMABs (OA-CMABs) OA-CMABs model the expected option outcomes as a function of context features and hidden parameters. We propose a robust Bayesian inference process for OA-CMABs that is based on the concept of probabilistic data validation.
arXiv Detail & Related papers (2023-12-19T20:28:42Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Multiple Independent DE Optimizations to Tackle Uncertainty and Variability in Demand in Inventory Management [0.0]
This study aims to discern the most effective strategy for minimizing inventory costs within the context of uncertain demand patterns. To find the optimal solution, the study focuses on meta-heuristic approaches and compares multiple algorithms.
arXiv Detail & Related papers (2023-09-22T13:15:02Z)
In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z)
Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z)
The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z)
Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models. PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z)
Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation. We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z)
Continuous Mean-Covariance Bandits [39.820490484375156]
We propose a novel Continuous Mean-Covariance Bandit model to take into account option correlation. In CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. We propose novel algorithms with optimal regrets (within logarithmic factors) and provide matching lower bounds to validate their optimalities.
arXiv Detail & Related papers (2021-02-24T06:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.