Related papers: Reinforcement Learning as Iterative and Amortised Inference

Reinforcement Learning as Iterative and Amortised Inference

URL: http://arxiv.org/abs/2006.10524v3
Date: Sun, 5 Jul 2020 18:37:20 GMT
Title: Reinforcement Learning as Iterative and Amortised Inference
Authors: Beren Millidge, Alexander Tschantz, Anil K Seth, Christopher L Buckley
Abstract summary: We use the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored.
Score: 62.997667081978825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There are several ways to categorise reinforcement learning (RL) algorithms, such as either model-based or model-free, policy-based or planning-based, on-policy or off-policy, and online or offline. Broad classification schemes such as these help provide a unified perspective on disparate techniques and can contextualise and guide the development of new algorithms. In this paper, we utilise the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We demonstrate that a wide range of algorithms can be classified in this manner providing a fresh perspective and highlighting a range of existing similarities. Moreover, we show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored, suggesting new routes to innovative RL algorithms.

Related papers

A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time" It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z)
Towards a Unified View of Preference Learning for Large Language Models: A Survey [88.66719962576005]
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. We decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm.
arXiv Detail & Related papers (2024-09-04T15:11:55Z)
Towards a Systematic Approach to Design New Ensemble Learning Algorithms [0.0]
This study revisits the foundational work on ensemble error decomposition. Recent advancements introduced a "unified theory of diversity" Our research systematically explores the application of this decomposition to guide the creation of new ensemble learning algorithms.
arXiv Detail & Related papers (2024-02-09T22:59:20Z)
Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective [41.05789078207364]
We provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. We show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system.
arXiv Detail & Related papers (2022-04-27T01:53:57Z)
On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning [24.264618706734012]
We show how to develop novel and more effective deep reinforcement learning algorithms. We focus on offline RL and finetuning as case studies. We introduce Distillation of a Mixture of Experts (DiME) We demonstrate that for offline RL, DiME leads to a simple new algorithm that outperforms state-of-the-art.
arXiv Detail & Related papers (2021-06-15T14:59:14Z)
A Survey on Deep Semi-supervised Learning [51.26862262550445]
We first present a taxonomy for deep semi-supervised learning that categorizes existing methods. We then offer a detailed comparison of these methods in terms of the type of losses, contributions, and architecture differences.
arXiv Detail & Related papers (2021-02-28T16:22:58Z)
Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art [3.6954802719347413]
Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points. A comprehensive survey of safe reinforcement learning algorithms was published in 2015, but related works in active learning and in optimization were not considered. This paper reviews those algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary algorithms, and active learning.
arXiv Detail & Related papers (2021-01-23T13:58:09Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.