Neural Algorithmic Reasoners are Implicit Planners
- URL: http://arxiv.org/abs/2110.05442v1
- Date: Mon, 11 Oct 2021 17:29:20 GMT
- Title: Neural Algorithmic Reasoners are Implicit Planners
- Authors: Andreea Deac, Petar Veli\v{c}kovi\'c, Ognjen Milinkovi\'c, Pierre-Luc
Bacon, Jian Tang, Mladen Nikoli\'c
- Abstract summary: We study the class of implicit planners inspired by value iteration.
Our method performs all planning computations in a high-dimensional latent space.
We empirically verify that XLVINs can closely align with value iteration.
- Score: 17.6650448492151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit planning has emerged as an elegant technique for combining learned
models of the world with end-to-end model-free reinforcement learning. We study
the class of implicit planners inspired by value iteration, an algorithm that
is guaranteed to yield perfect policies in fully-specified tabular
environments. We find that prior approaches either assume that the environment
is provided in such a tabular form -- which is highly restrictive -- or infer
"local neighbourhoods" of states to run value iteration over -- for which we
discover an algorithmic bottleneck effect. This effect is caused by explicitly
running the planning algorithm based on scalar predictions in every state,
which can be harmful to data efficiency if such scalars are improperly
predicted. We propose eXecuted Latent Value Iteration Networks (XLVINs), which
alleviate the above limitations. Our method performs all planning computations
in a high-dimensional latent space, breaking the algorithmic bottleneck. It
maintains alignment with value iteration by carefully leveraging neural
graph-algorithmic reasoning and contrastive self-supervised learning. Across
eight low-data settings -- including classical control, navigation and Atari --
XLVINs provide significant improvements to data efficiency against value
iteration-based implicit planners, as well as relevant model-free baselines.
Lastly, we empirically verify that XLVINs can closely align with value
iteration.
Related papers
- Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics [39.07258580928359]
We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting.
This setting uses linear function approximation to capture value functions and unifies existing models like linear Markov Decision Processes (MDP) and Linear Quadratic Regulators (LQR)
Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic.
arXiv Detail & Related papers (2024-06-17T17:52:38Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Continuous Neural Algorithmic Planners [3.9715120586766584]
XLVIN is a graph neural network that simulates the value algorithm in deep reinforcement learning agents.
It allows model-free iteration planning without access to privileged information about the environment.
We show how neural algorithmic reasoning can make a measurable impact in higher-dimensional continuous control settings.
arXiv Detail & Related papers (2022-11-29T00:19:35Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Convolutional Sparse Coding Fast Approximation with Application to
Seismic Reflectivity Estimation [9.005280130480308]
We propose a speed-up upgraded version of the classic iterative thresholding algorithm, that produces a good approximation of the convolutional sparse code within 2-5 iterations.
The performance of the proposed solution is demonstrated via the seismic inversion problem in both synthetic and real data scenarios.
arXiv Detail & Related papers (2021-06-29T12:19:07Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.