Blockwise Sequential Model Learning for Partially Observable
Reinforcement Learning
- URL: http://arxiv.org/abs/2112.05343v1
- Date: Fri, 10 Dec 2021 05:38:24 GMT
- Title: Blockwise Sequential Model Learning for Partially Observable
Reinforcement Learning
- Authors: Giseung Park, Sungho Choi, Youngchul Sung
- Abstract summary: This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems.
The proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization.
Numerical results show that the proposed method significantly outperforms previous methods in various partially observable environments.
- Score: 14.642266310020505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new sequential model learning architecture to solve
partially observable Markov decision problems. Rather than compressing
sequential information at every timestep as in conventional recurrent neural
network-based methods, the proposed architecture generates a latent variable in
each data block with a length of multiple timesteps and passes the most
relevant information to the next block for policy optimization. The proposed
blockwise sequential model is implemented based on self-attention, making the
model capable of detailed sequential learning in partial observable settings.
The proposed model builds an additional learning network to efficiently
implement gradient estimation by using self-normalized importance sampling,
which does not require the complex blockwise input data reconstruction in the
model learning. Numerical results show that the proposed method significantly
outperforms previous methods in various partially observable environments.
Related papers
- Learning of networked spreading models from noisy and incomplete data [7.669018800404791]
We introduce a universal learning method based on scalable dynamic message-passing technique.
The algorithm leverages available prior knowledge on the model and on the data, and reconstructs both network structure and parameters of a spreading model.
We show that a linear computational complexity of the method with the key model parameters makes the algorithm scalable to large network instances.
arXiv Detail & Related papers (2023-12-20T13:12:47Z) - OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive
Learning [67.07363529640784]
We propose OpenSTL to categorize prevalent approaches into recurrent-based and recurrent-free models.
We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and forecasting weather.
We find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models.
arXiv Detail & Related papers (2023-06-20T03:02:14Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent
Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies.
A neural network is employed to act as structure prior and reveal the underlying signal interdependencies.
Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z) - Learning Dynamics from Noisy Measurements using Deep Learning with a
Runge-Kutta Constraint [9.36739413306697]
We discuss a methodology to learn differential equation(s) using noisy and sparsely sampled measurements.
In our methodology, the main innovation can be seen in of integration of deep neural networks with a classical numerical integration method.
arXiv Detail & Related papers (2021-09-23T15:43:45Z) - Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet
Process [15.350366047108103]
Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks.
We perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data.
Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model.
arXiv Detail & Related papers (2021-08-25T21:06:20Z) - Self-learning sparse PCA for multimode process monitoring [2.8102838347038617]
This paper proposes a novel sparse principal component analysis algorithm with self-learning ability for successive modes.
Different from traditional multimode monitoring methods, the monitoring model is updated based on the current model and new data when a new mode arrives.
arXiv Detail & Related papers (2021-08-07T13:50:16Z) - PAC Bounds for Imitation and Model-based Batch Learning of Contextual
Markov Decision Processes [31.83144400718369]
We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment.
We study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning.
arXiv Detail & Related papers (2020-06-11T11:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.