Related papers: GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL

GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL

URL: http://arxiv.org/abs/2110.15489v1
Date: Fri, 29 Oct 2021 01:45:56 GMT
Title: GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL
Authors: Sumedh A Sontakke, Stephen Iota, Zizhao Hu, Arash Mehrjou, Laurent Itti, Bernhard Sch\"olkopf
Abstract summary: Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. We propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body.
Score: 11.058960131490903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data, and the data are a function of the policy followed by the agent. An agent could thus neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent that actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that GalilAI outperforms the baseline significantly. See visualizations of our method https://galil-ai.github.io/

Related papers

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning [55.36978389831446]
We recast reflective exploration within the Bayes-Adaptive RL framework.<n>Our resulting algorithm, BARL, instructs the LLM to stitch and switch strategies based on observed outcomes.
arXiv Detail & Related papers (2025-05-26T22:51:00Z)
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models [28.264011412168347]
Policy exploration is critical in reinforcement learning (RL), where existing approaches include greedy, Gaussian process, etc.<n>We design LLM-Explorer to adaptively generate task-specific exploration strategies with large language models (LLMs)<n>Our design is a plug-in module compatible with various widely applied RL algorithms, including the DQN series, DDPG, TD3, and any possible variants developed based on them.
arXiv Detail & Related papers (2025-05-21T09:24:23Z)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges. While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage. In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels. We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z)
Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z)
Testing of Deep Reinforcement Learning Agents with Surrogate Models [10.243488468625786]
Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. In this paper, we propose a search-based approach to test such agents.
arXiv Detail & Related papers (2023-05-22T06:21:39Z)
Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. In this work, we define a robust MRL objective with a controlled level. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z)
CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning [9.432068833600884]
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process.
arXiv Detail & Related papers (2022-10-03T21:16:14Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
A Validation Tool for Designing Reinforcement Learning Environments [0.0]
This study proposes a Markov-based feature analysis method to validate whether an MDP is well formulated. We believe an MDP suitable for applying RL should contain a set of state features that are both sensitive to actions and predictive in rewards.
arXiv Detail & Related papers (2021-12-10T13:28:08Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs. We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.