Related papers: Generalising Recursive Neural Models by Tensor Decomposition

Generalising Recursive Neural Models by Tensor Decomposition

URL: http://arxiv.org/abs/2006.10021v1
Date: Wed, 17 Jun 2020 17:28:19 GMT
Title: Generalising Recursive Neural Models by Tensor Decomposition
Authors: Daniele Castellana and Davide Bacciu
Abstract summary: We introduce a general approach to model aggregation of structural context leveraging a tensor-based formulation. We show how the exponential growth in the size of the parameter space can be controlled through an approximation based on the Tucker decomposition. By this means, we can effectively regulate the trade-off between expressivity of the encoding, controlled by the hidden size, computational complexity and model generalisation.
Score: 12.069862650316262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most machine learning models for structured data encode the structural knowledge of a node by leveraging simple aggregation functions (in neural models, typically a weighted sum) of the information in the node's neighbourhood. Nevertheless, the choice of simple context aggregation functions, such as the sum, can be widely sub-optimal. In this work we introduce a general approach to model aggregation of structural context leveraging a tensor-based formulation. We show how the exponential growth in the size of the parameter space can be controlled through an approximation based on the Tucker tensor decomposition. This approximation allows limiting the parameters space size, decoupling it from its strict relation with the size of the hidden encoding space. By this means, we can effectively regulate the trade-off between expressivity of the encoding, controlled by the hidden size, computational complexity and model generalisation, influenced by parameterisation. Finally, we introduce a new Tensorial Tree-LSTM derived as an instance of our framework and we use it to experimentally assess our working hypotheses on tree classification scenarios.

Related papers

Improving Set Function Approximation with Quasi-Arithmetic Neural Networks [23.73257235603082]
We propose quasi-arithmetic neural networks (QUANNs)<n>QUANNs are universal approximators for a broad class of common set-function decompositions.<n>We provide a theoretical analysis showing that, QUANNs are universal approximators for a broad class of common set-function decompositions.
arXiv Detail & Related papers (2026-02-04T18:36:31Z)
Combining Local Symmetry Exploitation and Reinforcement Learning for Optimised Probabilistic Inference -- A Work In Progress [2.2164989053903805]
Efficient probabilistic inference by variable elimination in graphical models requires an optimal elimination order.<n>We adapt a reinforcement learning approach to find efficient contraction orders in tensor networks.<n>We show that leveraging specific structures during inference allows for introducing compact encodings of intermediate results.
arXiv Detail & Related papers (2025-03-11T18:00:23Z)
Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
Generating Generalised Ground-State Ansatzes from Few-Body Examples [0.0]
We introduce a method that generates ground-state ansatzes for quantum many-body systems. The ansatzes are analytically tractable and accurate over wide parameter regimes. We demonstrate this method on the Lipkin-Meshkov-Glick model (LMG) and the quantum transverse-field Ising model (TFIM)
arXiv Detail & Related papers (2025-03-01T13:52:57Z)
Geometric Neural Process Fields [58.77241763774756]
Geometric Neural Process Fields (G-NPF) is a probabilistic framework for neural radiance fields that explicitly captures uncertainty. Building on these bases, we design a hierarchical latent variable model, allowing G-NPF to integrate structural information across multiple spatial levels. Experiments on novel-view synthesis for 3D scenes, as well as 2D image and 1D signal regression, demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-02-04T14:17:18Z)
Scaling Laws with Hidden Structure [2.474908349649168]
Recent advances suggest that text and image data contain such hidden structures, which help mitigate the curse of dimensionality. In this paper, we present a controlled experimental framework to test whether neural networks can indeed exploit such hidden factorial structures'' We find that they do leverage these latent patterns to learn discrete distributions more efficiently, and derive scaling laws linking model sizes, hidden factorizations, and accuracy.
arXiv Detail & Related papers (2024-11-02T22:32:53Z)
Approximate learning of parsimonious Bayesian context trees [0.0]
The proposed framework is tested on synthetic and real-world data examples. It outperforms existing sequence models when fitted to real protein sequences and honeypot computer terminal sessions.
arXiv Detail & Related papers (2024-07-27T11:50:40Z)
Bayesian Semi-structured Subspace Inference [0.0]
Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. We present a Bayesian approximation for semi-structured regression models using subspace inference. Our approach exhibits competitive predictive performance across simulated and real-world datasets.
arXiv Detail & Related papers (2024-01-23T18:15:58Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Learning Single-Index Models with Shallow Neural Networks [43.6480804626033]
We introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow. We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods.
arXiv Detail & Related papers (2022-10-27T17:52:58Z)
Toward an Over-parameterized Direct-Fit Model of Visual Perception [5.4823225815317125]
In this paper, we highlight the difference in parallel and sequential binding mechanisms between simple and complex cells. A new proposal for abstracting them into space partitioning and composition is developed. We show how it leads to a dynamic programming (DP)-like approximate nearest-neighbor search based on $ell_infty$-optimization.
arXiv Detail & Related papers (2022-10-07T23:54:30Z)
Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.