Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale
Inversion Problem
- URL: http://arxiv.org/abs/2011.08954v1
- Date: Tue, 17 Nov 2020 21:25:29 GMT
- Title: Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale
Inversion Problem
- Authors: Eric Chung, Yalchin Efendiev, Wing Tat Leung, Sai-Mang Pun, Zecheng
Zhang
- Abstract summary: We propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms.
The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward.
Our experiments show that the proposed method significantly improves the sampling process.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose a multi-agent actor-critic reinforcement learning
(RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC)
sampling algorithms. The policies (actors) of the agents are used to generate
the proposal in the MCMC steps; and the critic, which is centralized, is in
charge of estimating the long term reward. We verify our proposed algorithm by
solving an inverse problem with multiple scales. There are several difficulties
in the implementation of this problem by using traditional MCMC sampling.
Firstly, the computation of the posterior distribution involves evaluating the
forward solver, which is very time consuming for a problem with heterogeneous.
We hence propose to use the multi-level algorithm. More precisely, we use the
generalized multiscale finite element method (GMsFEM) as the forward solver in
evaluating a posterior distribution in the multi-level rejection procedure.
Secondly, it is hard to find a function which can generate samplings which are
meaningful. To solve this issue, we learn an RL policy as the proposal
generator. Our experiments show that the proposed method significantly improves
the sampling process
Related papers
- Why do we regularise in every iteration for imaging inverse problems? [0.29792392019703945]
Regularisation is commonly used in iterative methods for solving imaging inverse problems.
ProxSkip randomly skips regularisation steps, reducing the computational time of an iterative algorithm without affecting its convergence.
arXiv Detail & Related papers (2024-11-01T15:50:05Z) - Think Twice Before You Act: Improving Inverse Problem Solving With MCMC [40.5682961122897]
We propose textbfDiffusion textbfPosterior textbfMCMC (textbfDPMC) to solve inverse problems with pretrained diffusion models.
Our algorithm outperforms DPS with less number of evaluations across nearly all tasks, and is competitive among existing approaches.
arXiv Detail & Related papers (2024-09-13T06:10:54Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic [61.968469104271676]
We propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.
We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
arXiv Detail & Related papers (2023-01-28T04:12:56Z) - Off-Policy Correction For Multi-Agent Reinforcement Learning [9.599347559588216]
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.
Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.
We propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting.
arXiv Detail & Related papers (2021-11-22T14:23:13Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice.
It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration.
We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.