XLVIN: eXecuted Latent Value Iteration Nets
- URL: http://arxiv.org/abs/2010.13146v2
- Date: Sun, 6 Dec 2020 16:59:01 GMT
- Title: XLVIN: eXecuted Latent Value Iteration Nets
- Authors: Andreea Deac, Petar Veli\v{c}kovi\'c, Ognjen Milinkovi\'c, Pierre-Luc
Bacon, Jian Tang, Mladen Nikoli\'c
- Abstract summary: Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning.
We propose XLVINs, which combine recent developments across contrastive self-supervised learning, graph representation learning and neural algorithmic reasoning.
- Score: 17.535799331279417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value Iteration Networks (VINs) have emerged as a popular method to
incorporate planning algorithms within deep reinforcement learning, enabling
performance improvements on tasks requiring long-range reasoning and
understanding of environment dynamics. This came with several limitations,
however: the model is not incentivised in any way to perform meaningful
planning computations, the underlying state space is assumed to be discrete,
and the Markov decision process (MDP) is assumed fixed and known. We propose
eXecuted Latent Value Iteration Networks (XLVINs), which combine recent
developments across contrastive self-supervised learning, graph representation
learning and neural algorithmic reasoning to alleviate all of the above
limitations, successfully deploying VIN-style models on generic environments.
XLVINs match the performance of VIN-like models when the underlying MDP is
discrete, fixed and known, and provides significant improvements to model-free
baselines across three general MDP setups.
Related papers
- HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks.
These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation.
We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem.
We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning [1.8549313085249324]
This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS.
Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions.
The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%.
arXiv Detail & Related papers (2024-06-07T01:30:21Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Continuous Neural Algorithmic Planners [3.9715120586766584]
XLVIN is a graph neural network that simulates the value algorithm in deep reinforcement learning agents.
It allows model-free iteration planning without access to privileged information about the environment.
We show how neural algorithmic reasoning can make a measurable impact in higher-dimensional continuous control settings.
arXiv Detail & Related papers (2022-11-29T00:19:35Z) - JAX-DIPS: Neural bootstrapping of finite discretization methods and
application to elliptic problems with discontinuities [0.0]
This strategy can be used to efficiently train neural network surrogate models of partial differential equations.
The presented neural bootstrapping method (hereby dubbed NBM) is based on evaluation of the finite discretization residuals of the PDE system.
We show NBM is competitive in terms of memory and training speed with other PINN-type frameworks.
arXiv Detail & Related papers (2022-10-25T20:13:26Z) - Neural Algorithmic Reasoners are Implicit Planners [17.6650448492151]
We study the class of implicit planners inspired by value iteration.
Our method performs all planning computations in a high-dimensional latent space.
We empirically verify that XLVINs can closely align with value iteration.
arXiv Detail & Related papers (2021-10-11T17:29:20Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Graph neural induction of value iteration [22.582832003418826]
We propose a graph neural network (GNN) that executes the value iteration (VI) algorithm, across arbitrary environment models, with direct supervision on the intermediate steps of VI.
The results indicate that GNNs are able to model value iteration accurately, recovering favourable metrics and policies across a variety of out-of-distribution tests.
arXiv Detail & Related papers (2020-09-26T14:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.