Agent-State Construction with Auxiliary Inputs
- URL: http://arxiv.org/abs/2211.07805v3
- Date: Fri, 5 May 2023 22:03:24 GMT
- Title: Agent-State Construction with Auxiliary Inputs
- Authors: Ruo Yu Tao, Adam White, Marlos C. Machado
- Abstract summary: We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning.
We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased.
This approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation.
- Score: 16.79847469127811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many, if not every realistic sequential decision-making task, the
decision-making agent is not able to model the full complexity of the world.
The environment is often much larger and more complex than the agent, a setting
also known as partial observability. In such settings, the agent must leverage
more than just the current sensory inputs; it must construct an agent state
that summarizes previous interactions with the world. Currently, a popular
approach for tackling this problem is to learn the agent-state function via a
recurrent network from the agent's sensory stream as input. Many impressive
reinforcement learning applications have instead relied on environment-specific
functions to aid the agent's inputs for history summarization. These
augmentations are done in multiple ways, from simple approaches like
concatenating observations to more complex ones such as uncertainty estimates.
Although ubiquitous in the field, these additional inputs, which we term
auxiliary inputs, are rarely emphasized, and it is not clear what their role or
impact is. In this work we explore this idea further, and relate these
auxiliary inputs to prior classic approaches to state construction. We present
a series of examples illustrating the different ways of using auxiliary inputs
for reinforcement learning. We show that these auxiliary inputs can be used to
discriminate between observations that would otherwise be aliased, leading to
more expressive features that smoothly interpolate between different states.
Finally, we show that this approach is complementary to state-of-the-art
methods such as recurrent neural networks and truncated back-propagation
through time, and acts as a heuristic that facilitates longer temporal credit
assignment, leading to better performance.
Related papers
- Sim-to-Real Causal Transfer: A Metric Learning Approach to
Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions.
We show that recent representations are already partially resilient to perturbations of non-causal agents.
We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z) - What's in a Prior? Learned Proximal Networks for Inverse Problems [9.934876060237345]
Proximal operators are ubiquitous in inverse problems, commonly appearing as part of strategies to regularize problems that are otherwise ill-posed.
Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling.
arXiv Detail & Related papers (2023-10-22T16:31:01Z) - Rotating Features for Object Discovery [74.1465486264609]
We present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations.
Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data.
arXiv Detail & Related papers (2023-06-01T12:16:26Z) - Revisiting Modality Imbalance In Multimodal Pedestrian Detection [6.7841188753203046]
We introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities.
Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training.
arXiv Detail & Related papers (2023-02-24T11:56:57Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - ASCII: ASsisted Classification with Ignorance Interchange [17.413989127493622]
We propose a method named ASCII for an agent to improve its classification performance through assistance from other agents.
The main idea is to iteratively interchange an ignorance value between 0 and 1 for each collated sample among agents.
The method is naturally suitable for privacy-aware, transmission-economical, and decentralized learning scenarios.
arXiv Detail & Related papers (2020-10-21T03:57:36Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.