Critic Algorithms using Cooperative Networks
- URL: http://arxiv.org/abs/2201.07839v1
- Date: Wed, 19 Jan 2022 19:47:18 GMT
- Title: Critic Algorithms using Cooperative Networks
- Authors: Debangshu Banerjee and Kavita Wagh
- Abstract summary: An algorithm is proposed for policy evaluation in Markov Decision Processes.
The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An algorithm is proposed for policy evaluation in Markov Decision Processes
which gives good empirical results with respect to convergence rates. The
algorithm tracks the Projected Bellman Error and is implemented as a true
gradient based algorithm. In this respect this algorithm differs from
TD($\lambda$) class of algorithms. This algorithm tracks the Projected Bellman
Algorithm and is therefore different from the class of residual algorithms.
Further the convergence of this algorithm is empirically much faster than GTD2
class of algorithms which aim at tracking the Projected Bellman Error. We
implemented proposed algorithm in DQN and DDPG framework and found that our
algorithm achieves comparable results in both of these experiments
Related papers
- On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards.
For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z) - Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem.
We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning.
We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Quantum algorithms for anomaly detection using amplitude estimation [5.20363732303968]
Anomaly detection algorithm based on density estimation (called ADDE algorithm) is one of widely used algorithms.
In this paper, we propose a new quantum ADDE algorithm based on amplitude estimation.
arXiv Detail & Related papers (2021-09-28T15:47:56Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Critical Analysis: Bat Algorithm based Investigation and Application on
Several Domains [1.1802674324027231]
The idea of the algorithm was taken from the echolocation ability of bats.
Bat Algorithm is given in-depth in terms of backgrounds, characteristics, limitations.
arXiv Detail & Related papers (2021-01-18T19:25:12Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - Optimal and Practical Algorithms for Smooth and Strongly Convex
Decentralized Optimization [21.555331273873175]
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network.
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
arXiv Detail & Related papers (2020-06-21T11:23:20Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.