Related papers: Recurrent networks, hidden states and beliefs in partially observable environments

Recurrent networks, hidden states and beliefs in partially observable environments

URL: http://arxiv.org/abs/2208.03520v1
Date: Sat, 6 Aug 2022 13:56:16 GMT
Title: Recurrent networks, hidden states and beliefs in partially observable environments
Authors: Gaspard Lambrechts, Adrien Bolland, Damien Ernst
Abstract summary: Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. We show that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.
Score: 3.4066110654930473
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. Many methods rely on the approximation of a value function to derive near-optimal policies. In partially observable environments, these functions depend on the complete sequence of observations and past actions, called the history. In this work, we show empirically that recurrent neural networks trained to approximate such value functions internally filter the posterior probability distribution of the current state given the history, called the belief. More precisely, we show that, as a recurrent neural network learns the Q-function, its hidden states become more and more correlated with the beliefs of state variables that are relevant to optimal control. This correlation is measured through their mutual information. In addition, we show that the expected return of an agent increases with the ability of its recurrent architecture to reach a high mutual information between its hidden states and the beliefs. Finally, we show that the mutual information between the hidden states and the beliefs of variables that are irrelevant for optimal control decreases through the learning process. In summary, this work shows that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.

Related papers

Learning hidden cascades via classification [64.51931908932421]
We propose a partial observability-aware Machine Learning framework to learn the characteristics of the spreading model.<n>We evaluate our method on two types of synthetic networks and extend the study to a real-world insider trading network.
arXiv Detail & Related papers (2025-05-16T13:23:52Z)
Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations. In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework. At the heart of PSRL is the fusion of both supervised and unsupervised learning. We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z)
Prediction and Control in Continual Reinforcement Learning [39.30411018922005]
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. We propose to decompose the value function into two components which update at different timescales.
arXiv Detail & Related papers (2023-12-18T19:23:42Z)
Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings. We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z)
Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy. We propose an offline RL method that never needs to evaluate actions outside of the dataset. This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z)
On the role of feedback in visual processing: a predictive coding perspective [0.6193838300896449]
We consider deep convolutional networks (CNNs) as models of feed-forward visual processing and implement Predictive Coding (PC) dynamics. We find that the network increasingly relies on top-down predictions as the noise level increases. In addition, the accuracy of the network implementing PC dynamics significantly increases over time-steps, compared to its equivalent forward network.
arXiv Detail & Related papers (2021-06-08T10:07:23Z)
Data-driven discovery of interacting particle systems using Gaussian processes [3.0938904602244346]
We study the data-driven discovery of distance-based interaction laws in second-order interacting particle systems. We propose a learning approach that models the latent interaction kernel functions as Gaussian processes. Numerical results on systems that exhibit different collective behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.
arXiv Detail & Related papers (2021-06-04T22:00:53Z)
Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning [43.504548777955854]
We study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process. We prove that contrastive learning using textbfReLU networks provably learns the desired sparse features if proper augmentations are adopted.
arXiv Detail & Related papers (2021-05-31T16:42:09Z)
OR-Net: Pointwise Relational Inference for Data Completion under Partial Observation [51.083573770706636]
This work uses relational inference to fill in the incomplete data. We propose Omni-Relational Network (OR-Net) to model the pointwise relativity in two aspects.
arXiv Detail & Related papers (2021-05-02T06:05:54Z)
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z)
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.