Related papers: Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning

Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning

URL: http://arxiv.org/abs/2205.06000v1
Date: Thu, 12 May 2022 10:20:43 GMT
Title: Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning
Authors: Nathan Michlo, Devon Jarvis, Richard Klein, Steven James
Abstract summary: We investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning.
Score: 2.0646127669654826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric learning requires supervision in the form of a distance function, which is absent in reinforcement learning. To overcome this, we leverage the sequential nature of states in a replay buffer to approximate a distance metric and provide a weak supervision signal, under the assumption that temporally close states are also semantically similar. We modify a VAE with triplet loss and demonstrate that this approach is able to learn useful features for downstream tasks, without additional supervision, in environments where standard VAEs fail.

Related papers

Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction [2.778647101651566]
A fundamental problem in supervised learning is to find a good set of features or distance measures. We propose a supervised dimensionality reduction method, where the outputs of weak learners define the embedding. We show that the embedding coordinates provide better features for the supervised learning task.
arXiv Detail & Related papers (2024-05-14T10:23:57Z)
Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation. This reliance results in data inefficiency as maintaining a state-action-value function in high-dimensional action spaces is challenging. We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z)
Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD. The framework equips the UAD with continual learning capability through contrastively-learned prompts. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z)
On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning [23.876039876806182]
unsupervised representation learning (URL) has improved the sample efficiency of Reinforcement Learning (RL) We propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold. Our framework effectively learns predictive representations without collapse, which significantly improves the sample efficiency of state-of-the-art URL methods on the Atari 100k benchmark.
arXiv Detail & Related papers (2023-06-09T02:47:21Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
The Challenges of Continuous Self-Supervised Learning [40.941767578622745]
Self-supervised learning (SSL) aims to eliminate one of the major bottlenecks in representation learning - the need for human annotations. We show that a direct application of current methods to such continuous setup is inefficient both computationally and in the amount of data required. We propose the use of replay buffers as an approach to alleviate the issues of inefficiency and temporal correlations.
arXiv Detail & Related papers (2022-03-23T20:05:06Z)
Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research. We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift. Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z)
Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation [3.5092955099876266]
Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning. We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states. We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS)
arXiv Detail & Related papers (2022-01-28T18:06:06Z)
Learning Invariant Representations for Reinforcement Learning without Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction. Bisimulation metrics quantify behavioral similarity between states in continuous MDPs. We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.