Related papers: Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

URL: http://arxiv.org/abs/2105.00767v1
Date: Mon, 3 May 2021 11:50:06 GMT
Title: Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward
Authors: Xiong Wang, Riheng Jia
Abstract summary: Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis. In this paper, we study the mean field bandit game with a continuous reward function.
Score: 4.2710814397148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

Related papers

Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information. Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z)
A Single Online Agent Can Efficiently Learn Mean Field Games [16.00164239349632]
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
arXiv Detail & Related papers (2024-05-05T16:38:04Z)
Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z)
Regularization of the policy updates for stabilizing Mean Field Games [0.2348805691644085]
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) MARL where multiple agents interact in the same environment and whose goal is to maximize the individual returns. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
arXiv Detail & Related papers (2023-04-04T05:45:42Z)
Latent State Marginalization as a Low-cost Approach for Improving Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework. We show that latent variable policies naturally emerges under the use of world models with a latent belief state. We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z)
Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits [7.05949591248206]
The multi-armed bandit (MAB) model is one of the most popular models to study decision-making in an uncertain environment. In this paper, we employ techniques in statistical physics to analyze the MAB model.
arXiv Detail & Related papers (2022-08-11T09:32:03Z)
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents [43.21120427632336]
We consider a multi-agent strategic interaction over an infinite horizon where agents can be of multiple types. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. We show how such kind of interaction can model the cyber attacks among defenders and adversaries.
arXiv Detail & Related papers (2020-12-31T00:12:46Z)
Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game. We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
Sequential Monte Carlo Bandits [1.9205272414658485]
We extend Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed.
arXiv Detail & Related papers (2018-08-08T20:40:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.