A Tutorial Introduction to Reinforcement Learning
- URL: http://arxiv.org/abs/2304.00803v1
- Date: Mon, 3 Apr 2023 08:50:58 GMT
- Title: A Tutorial Introduction to Reinforcement Learning
- Authors: Mathukumalli Vidyasagar
- Abstract summary: We present a brief survey of Reinforcement Learning (RL), with particular emphasis on ApproximationSA as a unifying theme.
The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.
- Score: 1.9544213396776275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a brief survey of Reinforcement Learning (RL), with
particular emphasis on Stochastic Approximation (SA) as a unifying theme. The
scope of the paper includes Markov Reward Processes, Markov Decision Processes,
Stochastic Approximation algorithms, and widely used algorithms such as
Temporal Difference Learning and $Q$-learning.
Related papers
- PBES: PCA Based Exemplar Sampling Algorithm for Continual Learning [0.0]
We propose a novel exemplar selection approach based on Principal Component Analysis (PCA) and median sampling, and a neural network training regime in the setting of class-incremental learning.
This approach avoids the pitfalls due to outliers in the data and is both simple to implement and use across various incremental machine learning models.
arXiv Detail & Related papers (2023-12-14T21:27:38Z) - Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent
DAGs with Applications [6.03124479597323]
Counting and sampling directed acyclic graphs from a Markov equivalence class are fundamental tasks in causal analysis.
We show that these tasks can be performed in graphical time, solving a long-standing open problem in this area.
Our algorithms are effective and easily implementable.
arXiv Detail & Related papers (2022-05-05T13:56:13Z) - Markov Abstractions for PAC Reinforcement Learning in Non-Markov
Decision Processes [90.53326983143644]
We show that Markov abstractions can be learned during reinforcement learning.
We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees.
arXiv Detail & Related papers (2022-04-29T16:53:00Z) - Average-Reward Learning and Planning with Options [9.258958295945467]
We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs.
Our contributions include general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as sample-based planning variants of our learning algorithms.
arXiv Detail & Related papers (2021-10-26T16:58:05Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z) - A Hybrid PAC Reinforcement Learning Algorithm [5.279475826661642]
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs)
The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases.
arXiv Detail & Related papers (2020-09-05T21:32:42Z) - Reinforcement Learning as Iterative and Amortised Inference [62.997667081978825]
We use the control as inference framework to outline a novel classification scheme based on amortised and iterative inference.
We show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored.
arXiv Detail & Related papers (2020-06-13T16:10:03Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.