Neuro-algorithmic Policies enable Fast Combinatorial Generalization
- URL: http://arxiv.org/abs/2102.07456v1
- Date: Mon, 15 Feb 2021 11:07:59 GMT
- Title: Neuro-algorithmic Policies enable Fast Combinatorial Generalization
- Authors: Marin Vlastelica, Michal Rol\'inek and Georg Martius
- Abstract summary: Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data.
We show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures.
We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependent shortest path solver.
- Score: 16.74322664734553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although model-based and model-free approaches to learning the control of
systems have achieved impressive results on standard benchmarks, generalization
to task variations is still lacking. Recent results suggest that generalization
for standard architectures improves only after obtaining exhaustive amounts of
data. We give evidence that generalization capabilities are in many cases
bottlenecked by the inability to generalize on the combinatorial aspects of the
problem. Furthermore, we show that for a certain subclass of the MDP framework,
this can be alleviated by neuro-algorithmic architectures.
Many control problems require long-term planning that is hard to solve
generically with neural networks alone. We introduce a neuro-algorithmic policy
architecture consisting of a neural network and an embedded time-dependent
shortest path solver. These policies can be trained end-to-end by blackbox
differentiation. We show that this type of architecture generalizes well to
unseen variations in the environment already after seeing a few examples.
Related papers
- A Neural Rewriting System to Solve Algorithmic Problems [47.129504708849446]
We propose a modular architecture designed to learn a general procedure for solving nested mathematical formulas.
Inspired by rewriting systems, a classic framework in symbolic artificial intelligence, we include in the architecture three specialized and interacting modules.
We benchmark our system against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies.
arXiv Detail & Related papers (2024-02-27T10:57:07Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Multilevel-in-Layer Training for Deep Neural Network Regression [1.6185544531149159]
We present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks.
We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer.
arXiv Detail & Related papers (2022-11-11T23:53:46Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - Polynomial-Spline Neural Networks with Exact Integrals [0.0]
We develop a novel neural network architecture that combines a mixture-of-experts model with free knot B1-spline basis functions.
Our architecture exhibits both $h$- and $p$- refinement for regression problems at the convergence rates expected from approximation theory.
We demonstrate the success of our network on a range of regression and variational problems that illustrate the consistency and exact integrability of our network architecture.
arXiv Detail & Related papers (2021-10-26T22:12:37Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Wide Network Learning with Differential Privacy [7.453881927237143]
Current generation of neural networks suffers significant loss accuracy under most practically relevant privacy training regimes.
We develop a general approach towards training these models that takes advantage of the sparsity of the gradients of private Empirical Minimization (ERM)
Following the same number of parameters, we propose a novel algorithm for privately training neural networks.
arXiv Detail & Related papers (2021-03-01T20:31:50Z) - Automated Search for Resource-Efficient Branched Multi-Task Networks [81.48051635183916]
We propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching structures in a multi-task neural network.
We show that our approach consistently finds high-performing branching structures within limited resource budgets.
arXiv Detail & Related papers (2020-08-24T09:49:19Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.