Agent-State Construction with Auxiliary Inputs
- URL: http://arxiv.org/abs/2211.07805v3
- Date: Fri, 5 May 2023 22:03:24 GMT
- Title: Agent-State Construction with Auxiliary Inputs
- Authors: Ruo Yu Tao, Adam White, Marlos C. Machado
- Abstract summary: We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning.
We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased.
This approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation.
- Score: 16.79847469127811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many, if not every realistic sequential decision-making task, the
decision-making agent is not able to model the full complexity of the world.
The environment is often much larger and more complex than the agent, a setting
also known as partial observability. In such settings, the agent must leverage
more than just the current sensory inputs; it must construct an agent state
that summarizes previous interactions with the world. Currently, a popular
approach for tackling this problem is to learn the agent-state function via a
recurrent network from the agent's sensory stream as input. Many impressive
reinforcement learning applications have instead relied on environment-specific
functions to aid the agent's inputs for history summarization. These
augmentations are done in multiple ways, from simple approaches like
concatenating observations to more complex ones such as uncertainty estimates.
Although ubiquitous in the field, these additional inputs, which we term
auxiliary inputs, are rarely emphasized, and it is not clear what their role or
impact is. In this work we explore this idea further, and relate these
auxiliary inputs to prior classic approaches to state construction. We present
a series of examples illustrating the different ways of using auxiliary inputs
for reinforcement learning. We show that these auxiliary inputs can be used to
discriminate between observations that would otherwise be aliased, leading to
more expressive features that smoothly interpolate between different states.
Finally, we show that this approach is complementary to state-of-the-art
methods such as recurrent neural networks and truncated back-propagation
through time, and acts as a heuristic that facilitates longer temporal credit
assignment, leading to better performance.
Related papers
- Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization [56.674356045200696]
We propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for cumbersome note systems or prior high-quality demonstration data.
Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights.
We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent which, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in a taskset.
arXiv Detail & Related papers (2025-02-03T17:45:46Z) - Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning [0.41783829807634765]
Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games.
This paper advocates for direct interpretability, generating post hoc explanations directly from trained models.
We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery.
arXiv Detail & Related papers (2025-02-02T09:15:27Z) - Sim-to-Real Causal Transfer: A Metric Learning Approach to
Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions.
We show that recent representations are already partially resilient to perturbations of non-causal agents.
We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z) - What's in a Prior? Learned Proximal Networks for Inverse Problems [9.934876060237345]
Proximal operators are ubiquitous in inverse problems, commonly appearing as part of strategies to regularize problems that are otherwise ill-posed.
Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling.
arXiv Detail & Related papers (2023-10-22T16:31:01Z) - Rotating Features for Object Discovery [74.1465486264609]
We present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations.
Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data.
arXiv Detail & Related papers (2023-06-01T12:16:26Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - ASCII: ASsisted Classification with Ignorance Interchange [17.413989127493622]
We propose a method named ASCII for an agent to improve its classification performance through assistance from other agents.
The main idea is to iteratively interchange an ignorance value between 0 and 1 for each collated sample among agents.
The method is naturally suitable for privacy-aware, transmission-economical, and decentralized learning scenarios.
arXiv Detail & Related papers (2020-10-21T03:57:36Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.