Related papers: Efficient Learning of Discrete-Continuous Computation Graphs

Efficient Learning of Discrete-Continuous Computation Graphs

URL: http://arxiv.org/abs/2307.14193v1
Date: Wed, 26 Jul 2023 13:47:52 GMT
Title: Efficient Learning of Discrete-Continuous Computation Graphs
Authors: David Friede and Mathias Niepert
Abstract summary: A popular approach to building discrete-continuous computation is that generalizes discrete probability distributions into neural networks using softmax tricks. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We show that we can train complex discrete-continuous models which one cannot train with standard softmax tricks.
Score: 15.26733033527393
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.

Related papers

Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z)
Learning Sparse and Continuous Graph Structures for Multivariate Time Series Forecasting [5.359968374560132]
Learning Sparse and Continuous Graphs for Forecasting (LSCGF) is a novel deep learning model that joins graph learning and forecasting. In this paper, we propose a brand new method named Smooth Sparse Unit (SSU) to learn sparse and continuous graph adjacency matrix. Our model achieves state-of-the-art performances with minor trainable parameters.
arXiv Detail & Related papers (2022-01-24T13:35:37Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Gaussian Graphical Model Selection for Huge Data via Minipatch Learning [1.2891210250935146]
We propose the Minipatch Graph (MPGraph) estimator to solve the problem of graphical model selection. MPGraph is a generalization of thresholded graph estimators fit to tiny, random subsets of both the observations and the nodes. We prove that our algorithm achieves finite-sample graph selection consistency.
arXiv Detail & Related papers (2021-10-22T21:06:48Z)
Regularization of Mixture Models for Robust Principal Graph Learning [0.0]
A regularized version of Mixture Models is proposed to learn a principal graph from a distribution of $D$-dimensional data points. Parameters of the model are iteratively estimated through an Expectation-Maximization procedure.
arXiv Detail & Related papers (2021-06-16T18:00:02Z)
Stochastic Iterative Graph Matching [11.128153575173213]
We propose a new model, Iterative Graph MAtching, to address the graph matching problem. Our model defines a distribution of matchings for a graph pair so the model can explore a wide range of possible matchings. We conduct extensive experiments across synthetic graph datasets as well as biochemistry and computer vision applications.
arXiv Detail & Related papers (2021-06-04T02:05:35Z)
Continuous-in-Depth Neural Networks [107.47887213490134]
We first show that ResNets fail to be meaningful dynamical in this richer sense. We then demonstrate that neural network models can learn to represent continuous dynamical systems. We introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures.
arXiv Detail & Related papers (2020-08-05T22:54:09Z)
Scaling Graph Clustering with Distributed Sketches [1.1011268090482575]
We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections. We show that our method produces embeddings that yield performant clustering results given a fully-dynamic block model stream. We also discuss the effects of block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.
arXiv Detail & Related papers (2020-07-24T17:38:04Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
Graph Ordering: Towards the Optimal by Learning [69.72656588714155]
Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, prediction, and community detection. However, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks. In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach.
arXiv Detail & Related papers (2020-01-18T09:14:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.