Recurrent networks, hidden states and beliefs in partially observable
environments
- URL: http://arxiv.org/abs/2208.03520v1
- Date: Sat, 6 Aug 2022 13:56:16 GMT
- Title: Recurrent networks, hidden states and beliefs in partially observable
environments
- Authors: Gaspard Lambrechts, Adrien Bolland, Damien Ernst
- Abstract summary: Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown.
We show that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.
- Score: 3.4066110654930473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning aims to learn optimal policies from interaction with
environments whose dynamics are unknown. Many methods rely on the approximation
of a value function to derive near-optimal policies. In partially observable
environments, these functions depend on the complete sequence of observations
and past actions, called the history. In this work, we show empirically that
recurrent neural networks trained to approximate such value functions
internally filter the posterior probability distribution of the current state
given the history, called the belief. More precisely, we show that, as a
recurrent neural network learns the Q-function, its hidden states become more
and more correlated with the beliefs of state variables that are relevant to
optimal control. This correlation is measured through their mutual information.
In addition, we show that the expected return of an agent increases with the
ability of its recurrent architecture to reach a high mutual information
between its hidden states and the beliefs. Finally, we show that the mutual
information between the hidden states and the beliefs of variables that are
irrelevant for optimal control decreases through the learning process. In
summary, this work shows that in its hidden states, a recurrent neural network
approximating the Q-function of a partially observable environment reproduces a
sufficient statistic from the history that is correlated to the relevant part
of the belief for taking optimal actions.
Related papers
- Learning hidden cascades via classification [64.51931908932421]
We propose a partial observability-aware Machine Learning framework to learn the characteristics of the spreading model.<n>We evaluate our method on two types of synthetic networks and extend the study to a real-world insider trading network.
arXiv Detail & Related papers (2025-05-16T13:23:52Z) - Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations.
In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z) - Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.
We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.
This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - Prediction and Control in Continual Reinforcement Learning [39.30411018922005]
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies.
We propose to decompose the value function into two components which update at different timescales.
arXiv Detail & Related papers (2023-12-18T19:23:42Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - On the role of feedback in visual processing: a predictive coding
perspective [0.6193838300896449]
We consider deep convolutional networks (CNNs) as models of feed-forward visual processing and implement Predictive Coding (PC) dynamics.
We find that the network increasingly relies on top-down predictions as the noise level increases.
In addition, the accuracy of the network implementing PC dynamics significantly increases over time-steps, compared to its equivalent forward network.
arXiv Detail & Related papers (2021-06-08T10:07:23Z) - Data-driven discovery of interacting particle systems using Gaussian
processes [3.0938904602244346]
We study the data-driven discovery of distance-based interaction laws in second-order interacting particle systems.
We propose a learning approach that models the latent interaction kernel functions as Gaussian processes.
Numerical results on systems that exhibit different collective behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.
arXiv Detail & Related papers (2021-06-04T22:00:53Z) - Toward Understanding the Feature Learning Process of Self-supervised
Contrastive Learning [43.504548777955854]
We study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process.
We prove that contrastive learning using textbfReLU networks provably learns the desired sparse features if proper augmentations are adopted.
arXiv Detail & Related papers (2021-05-31T16:42:09Z) - OR-Net: Pointwise Relational Inference for Data Completion under Partial
Observation [51.083573770706636]
This work uses relational inference to fill in the incomplete data.
We propose Omni-Relational Network (OR-Net) to model the pointwise relativity in two aspects.
arXiv Detail & Related papers (2021-05-02T06:05:54Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.