Related papers: World Value Functions: Knowledge Representation for Multitask Reinforcement Learning

World Value Functions: Knowledge Representation for Multitask Reinforcement Learning

URL: http://arxiv.org/abs/2205.08827v1
Date: Wed, 18 May 2022 09:45:14 GMT
Title: World Value Functions: Knowledge Representation for Multitask Reinforcement Learning
Authors: Geraud Nangue Tasse, Steven James, Benjamin Rosman
Abstract summary: We propose world value functions (WVFs), which are a type of general value function with mastery of the world. We equip the agent with an internal goal space defined as all the world states where it experiences a terminal transition. We show that for tasks in the same world, a pretrained agent that has learned any WVF can then infer the policy and value function for any new task directly from its rewards.
Score: 14.731788603429774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An open problem in artificial intelligence is how to learn and represent knowledge that is sufficient for a general agent that needs to solve multiple tasks in a given world. In this work we propose world value functions (WVFs), which are a type of general value function with mastery of the world - they represent not only how to solve a given task, but also how to solve any other goal-reaching task. To achieve this, we equip the agent with an internal goal space defined as all the world states where it experiences a terminal transition - a task outcome. The agent can then modify task rewards to define its own reward function, which provably drives it to learn how to achieve all achievable internal goals, and the value of doing so in the current task. We demonstrate a number of benefits of WVFs. When the agent's internal goal space is the entire state space, we demonstrate that the transition function can be inferred from the learned WVF, which allows the agent to plan using learned value functions. Additionally, we show that for tasks in the same world, a pretrained agent that has learned any WVF can then infer the policy and value function for any new task directly from its rewards. Finally, an important property for long-lived agents is the ability to reuse existing knowledge to solve new tasks. Using WVFs as the knowledge representation for learned tasks, we show that an agent is able to solve their logical combination zero-shot, resulting in a combinatorially increasing number of skills throughout their lifetime.

Related papers

Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization [56.674356045200696]
We propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for cumbersome note systems or prior high-quality demonstration data. Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights. We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent which, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in a taskset.
arXiv Detail & Related papers (2025-02-03T17:45:46Z)
Visual Grounding for Object-Level Generalization in Reinforcement Learning [35.39214541324909]
Generalization is a pivotal challenge for agents following natural language instructions. We leverage a vision-language model (VLM) for visual grounding and transfer its vision-language knowledge into reinforcement learning. We show that our intrinsic reward significantly improves performance on challenging skill learning.
arXiv Detail & Related papers (2024-08-04T06:34:24Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
Task Aware Dreamer for Task Generalization in Reinforcement Learning [32.93706056123124]
We show that training a general world model can utilize similar structures in tasks and help train more generalizable agents. We introduce a novel method named Task Aware Dreamer (TAD), which integrates reward-informed features to identify latent characteristics across tasks. Experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously.
arXiv Detail & Related papers (2023-03-09T08:04:16Z)
World Value Functions: Knowledge Representation for Learning and Planning [14.731788603429774]
We propose world value functions (WVFs), a type of goal-oriented general value function. WVFs represent how to solve not just a given task, but any other goal-reaching task in an agent's environment. We show that WVFs can be learned faster than regular value functions, while their ability to infer the environment's dynamics can be used to integrate learning and planning methods.
arXiv Detail & Related papers (2022-06-23T18:49:54Z)
LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL. To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy. We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z)
Domain-Robust Visual Imitation Learning with Mutual Information Constraints [0.0]
We introduce a new algorithm called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL) Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task.
arXiv Detail & Related papers (2021-03-08T21:18:58Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
A Boolean Task Algebra for Reinforcement Learning [14.731788603429774]
We formalise the logical composition of tasks as a Boolean algebra. We show that by learning goal-oriented value functions, an agent can solve new tasks with no further learning.
arXiv Detail & Related papers (2020-01-06T04:46:25Z)
Universal Successor Features for Transfer Reinforcement Learning [77.27304854836645]
We propose Universal Successor Features (USFs) to capture the underlying dynamics of the environment. We show that USFs is compatible with any RL algorithm that learns state values using a temporal difference method.
arXiv Detail & Related papers (2020-01-05T03:41:06Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.