Generalising Recursive Neural Models by Tensor Decomposition
- URL: http://arxiv.org/abs/2006.10021v1
- Date: Wed, 17 Jun 2020 17:28:19 GMT
- Title: Generalising Recursive Neural Models by Tensor Decomposition
- Authors: Daniele Castellana and Davide Bacciu
- Abstract summary: We introduce a general approach to model aggregation of structural context leveraging a tensor-based formulation.
We show how the exponential growth in the size of the parameter space can be controlled through an approximation based on the Tucker decomposition.
By this means, we can effectively regulate the trade-off between expressivity of the encoding, controlled by the hidden size, computational complexity and model generalisation.
- Score: 12.069862650316262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most machine learning models for structured data encode the structural
knowledge of a node by leveraging simple aggregation functions (in neural
models, typically a weighted sum) of the information in the node's
neighbourhood. Nevertheless, the choice of simple context aggregation
functions, such as the sum, can be widely sub-optimal. In this work we
introduce a general approach to model aggregation of structural context
leveraging a tensor-based formulation. We show how the exponential growth in
the size of the parameter space can be controlled through an approximation
based on the Tucker tensor decomposition. This approximation allows limiting
the parameters space size, decoupling it from its strict relation with the size
of the hidden encoding space. By this means, we can effectively regulate the
trade-off between expressivity of the encoding, controlled by the hidden size,
computational complexity and model generalisation, influenced by
parameterisation. Finally, we introduce a new Tensorial Tree-LSTM derived as an
instance of our framework and we use it to experimentally assess our working
hypotheses on tree classification scenarios.
Related papers
- Scaling Laws with Hidden Structure [2.474908349649168]
Recent advances suggest that text and image data contain such hidden structures, which help mitigate the curse of dimensionality.
In this paper, we present a controlled experimental framework to test whether neural networks can indeed exploit such hidden factorial structures''
We find that they do leverage these latent patterns to learn discrete distributions more efficiently, and derive scaling laws linking model sizes, hidden factorizations, and accuracy.
arXiv Detail & Related papers (2024-11-02T22:32:53Z) - Approximate learning of parsimonious Bayesian context trees [0.0]
The proposed framework is tested on synthetic and real-world data examples.
It outperforms existing sequence models when fitted to real protein sequences and honeypot computer terminal sessions.
arXiv Detail & Related papers (2024-07-27T11:50:40Z) - Bayesian Semi-structured Subspace Inference [0.0]
Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects.
We present a Bayesian approximation for semi-structured regression models using subspace inference.
Our approach exhibits competitive predictive performance across simulated and real-world datasets.
arXiv Detail & Related papers (2024-01-23T18:15:58Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Learning Single-Index Models with Shallow Neural Networks [43.6480804626033]
We introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow.
We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods.
arXiv Detail & Related papers (2022-10-27T17:52:58Z) - Toward an Over-parameterized Direct-Fit Model of Visual Perception [5.4823225815317125]
In this paper, we highlight the difference in parallel and sequential binding mechanisms between simple and complex cells.
A new proposal for abstracting them into space partitioning and composition is developed.
We show how it leads to a dynamic programming (DP)-like approximate nearest-neighbor search based on $ell_infty$-optimization.
arXiv Detail & Related papers (2022-10-07T23:54:30Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.