Accounting for the Sequential Nature of States to Learn Features for
Reinforcement Learning
- URL: http://arxiv.org/abs/2205.06000v1
- Date: Thu, 12 May 2022 10:20:43 GMT
- Title: Accounting for the Sequential Nature of States to Learn Features for
Reinforcement Learning
- Authors: Nathan Michlo, Devon Jarvis, Richard Klein, Steven James
- Abstract summary: We investigate the properties of data that cause popular representation learning approaches to fail.
In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features.
We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning.
- Score: 2.0646127669654826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we investigate the properties of data that cause popular
representation learning approaches to fail. In particular, we find that in
environments where states do not significantly overlap, variational
autoencoders (VAEs) fail to learn useful features. We demonstrate this failure
in a simple gridworld domain, and then provide a solution in the form of metric
learning. However, metric learning requires supervision in the form of a
distance function, which is absent in reinforcement learning. To overcome this,
we leverage the sequential nature of states in a replay buffer to approximate a
distance metric and provide a weak supervision signal, under the assumption
that temporally close states are also semantically similar. We modify a VAE
with triplet loss and demonstrate that this approach is able to learn useful
features for downstream tasks, without additional supervision, in environments
where standard VAEs fail.
Related papers
- Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction [2.778647101651566]
A fundamental problem in supervised learning is to find a good set of features or distance measures.
We propose a supervised dimensionality reduction method, where the outputs of weak learners define the embedding.
We show that the embedding coordinates provide better features for the supervised learning task.
arXiv Detail & Related papers (2024-05-14T10:23:57Z) - Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.
This reliance results in data inefficiency as maintaining a state-action-value function in high-dimensional action spaces is challenging.
We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - Unsupervised Continual Anomaly Detection with Contrastively-learned
Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD.
The framework equips the UAD with continual learning capability through contrastively-learned prompts.
We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z) - On the Importance of Feature Decorrelation for Unsupervised
Representation Learning in Reinforcement Learning [23.876039876806182]
unsupervised representation learning (URL) has improved the sample efficiency of Reinforcement Learning (RL)
We propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold.
Our framework effectively learns predictive representations without collapse, which significantly improves the sample efficiency of state-of-the-art URL methods on the Atari 100k benchmark.
arXiv Detail & Related papers (2023-06-09T02:47:21Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Efficient Embedding of Semantic Similarity in Control Policies via
Entangled Bisimulation [3.5092955099876266]
Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning.
We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states.
We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS)
arXiv Detail & Related papers (2022-01-28T18:06:06Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.