Active Inference for Autonomous Decision-Making with Contextual
Multi-Armed Bandits
- URL: http://arxiv.org/abs/2209.09185v1
- Date: Mon, 19 Sep 2022 17:11:21 GMT
- Title: Active Inference for Autonomous Decision-Making with Contextual
Multi-Armed Bandits
- Authors: Shohei Wakayama and Nisar Ahmed
- Abstract summary: In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered.
In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs.
- Score: 1.3670071336891754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous robotic decision-making under uncertainty, the tradeoff between
exploitation and exploration of available options must be considered. If
secondary information associated with options can be utilized, such
decision-making problems can often be formulated as a contextual multi-armed
bandits (CMABs). In this study, we apply active inference, which has been
actively studied in the field of neuroscience in recent years, as an
alternative action selection strategy for CMABs. Unlike conventional action
selection strategies, it is possible to rigorously evaluate the uncertainty of
each option when calculating the expected free energy (EFE) associated with the
decision agent's probabilistic model, as derived from the free-energy
principle. We specifically address the case where a categorical observation
likelihood function is used, such that EFE values are analytically intractable.
We introduce new approximation methods for computing the EFE based on
variational and Laplace approximations. Extensive simulation study results
demonstrate that, compared to other strategies, active inference generally
requires far fewer iterations to identify optimal options and generally
achieves superior cumulative regret, for relatively low extra computational
cost.
Related papers
- A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z) - Conformalized Decision Risk Assessment [5.391713612899277]
We introduce CREDO, a novel framework that quantifies for any candidate decision, a distribution-free upper bound on the probability that the decision is suboptimal.<n>By combining inverse optimization geometry with conformal prediction and generative modeling, CREDO produces risk certificates that are both statistically rigorous and practically interpretable.
arXiv Detail & Related papers (2025-05-19T15:24:38Z) - Treatment Effect Estimation for Optimal Decision-Making [65.30942348196443]
We study optimal decision-making based on two-stage CATE estimators.<n>We propose a novel two-stage learning objective that retargets the CATE to balance CATE estimation error and decision performance.
arXiv Detail & Related papers (2025-05-19T13:24:57Z) - Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options [2.1184929769291294]
This work introduces a novel framework for evaluating LLMs' capacity to balance instruction-following with critical reasoning.<n>We show that post-training aligned models often default to selecting invalid options, while base models exhibit improved refusal capabilities that scale with model size.<n>We additionally conduct a parallel human study showing similar instruction-following biases, with implications for how these biases may propagate through human feedback datasets used in alignment.
arXiv Detail & Related papers (2024-08-27T19:27:43Z) - An Efficient Approach for Solving Expensive Constrained Multiobjective Optimization Problems [0.0]
An efficient probabilistic selection based constrained multi-objective EA is proposed, referred to as PSCMOEA.
It comprises novel elements such as (a) an adaptive search bound identification scheme based on the feasibility and convergence status of evaluated solutions.
Numerical experiments are conducted on an extensive range of challenging constrained problems using low evaluation budgets to simulate ECMOPs.
arXiv Detail & Related papers (2024-05-22T02:32:58Z) - Globally-Optimal Greedy Experiment Selection for Active Sequential
Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data.
The goal is to design experiment selection rules for more accurate model estimation.
We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z) - Observation-Augmented Contextual Multi-Armed Bandits for Robotic
Exploration with Uncertain Semantic Data [7.795929277007235]
We introduce a new variant of contextual multi-armed bandits called observation-augmented CMABs (OA-CMABs)
OA-CMABs model the expected option outcomes as a function of context features and hidden parameters.
We propose a robust Bayesian inference process for OA-CMABs that is based on the concept of probabilistic data validation.
arXiv Detail & Related papers (2023-12-19T20:28:42Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Multiple Independent DE Optimizations to Tackle Uncertainty and
Variability in Demand in Inventory Management [0.0]
This study aims to discern the most effective strategy for minimizing inventory costs within the context of uncertain demand patterns.
To find the optimal solution, the study focuses on meta-heuristic approaches and compares multiple algorithms.
arXiv Detail & Related papers (2023-09-22T13:15:02Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Continuous Mean-Covariance Bandits [39.820490484375156]
We propose a novel Continuous Mean-Covariance Bandit model to take into account option correlation.
In CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions.
We propose novel algorithms with optimal regrets (within logarithmic factors) and provide matching lower bounds to validate their optimalities.
arXiv Detail & Related papers (2021-02-24T06:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.