Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings
- URL: http://arxiv.org/abs/2402.08145v2
- Date: Fri, 7 Jun 2024 01:21:18 GMT
- Title: Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings
- Authors: Rushang Karia, Pulkit Verma, Alberto Speranzon, Siddharth Srivastava,
- Abstract summary: This paper introduces a new approach for continual planning and model learning in non-stationary environments.
The proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations.
Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity.
- Score: 23.038187032666304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.
Related papers
- Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity [16.15952351162363]
We introduce a new formalism, Hidden.
POMDP, designed for control with adaptive world models.
We demonstrate that this approach enables learning robust behaviors across a variety of non-stationary RL benchmarks.
arXiv Detail & Related papers (2024-11-02T19:09:56Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning [8.552540426753]
This paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning.
Results indicate that our method improves efficiency of the planning process.
arXiv Detail & Related papers (2024-06-27T22:24:46Z) - Learning World Models with Identifiable Factorization [39.767120163665574]
We propose IFactor to model four distinct categories of latent state variables.
Our analysis establishes block-wise identifiability of these latent variables.
We present a practical approach to learning the world model with identifiable blocks.
arXiv Detail & Related papers (2023-06-11T02:25:15Z) - Quantifying and Explaining Machine Learning Uncertainty in Predictive
Process Monitoring: An Operations Research Perspective [0.0]
This paper introduces a comprehensive, multi-stage machine learning methodology that integrates information systems and artificial intelligence.
The proposed framework adeptly addresses common limitations of existing solutions, such as the neglect of data-driven estimation.
Our approach employs Quantile Regression Forests for generating interval predictions, alongside both local and global variants of SHapley Additive Explanations.
arXiv Detail & Related papers (2023-04-13T11:18:22Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time.
We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z) - LEADS: Learning Dynamical Systems that Generalize Across Environments [12.024388048406587]
We propose LEADS, a novel framework that leverages the commonalities and discrepancies among known environments to improve model generalization.
We show that this new setting can exploit knowledge extracted from environment-dependent data and improves generalization for both known and novel environments.
arXiv Detail & Related papers (2021-06-08T17:28:19Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - From Simulation to Real World Maneuver Execution using Deep
Reinforcement Learning [69.23334811890919]
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios.
This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets.
We present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios.
arXiv Detail & Related papers (2020-05-13T14:22:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.