Related papers: Generalization and Overfitting in Matrix Product State Machine Learning Architectures

Generalization and Overfitting in Matrix Product State Machine Learning Architectures

URL: http://arxiv.org/abs/2208.04372v1
Date: Mon, 8 Aug 2022 19:13:34 GMT
Title: Generalization and Overfitting in Matrix Product State Machine Learning Architectures
Authors: Artem Strashko, E. Miles Stoudenmire
Abstract summary: We construct artificial data which can be exactly modeled by an MPS and train the models with different number of parameters. We observe model overfitting for one-dimensional data, but also find that for more complex data overfitting is less significant, while with MNIST image data we do not find any signatures of overfitting.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While overfitting and, more generally, double descent are ubiquitous in machine learning, increasing the number of parameters of the most widely used tensor network, the matrix product state (MPS), has generally lead to monotonic improvement of test performance in previous studies. To better understand the generalization properties of architectures parameterized by MPS, we construct artificial data which can be exactly modeled by an MPS and train the models with different number of parameters. We observe model overfitting for one-dimensional data, but also find that for more complex data overfitting is less significant, while with MNIST image data we do not find any signatures of overfitting. We speculate that generalization properties of MPS depend on the properties of data: with one-dimensional data (for which the MPS ansatz is the most suitable) MPS is prone to overfitting, while with more complex data which cannot be fit by MPS exactly, overfitting may be much less significant.

Related papers

Comb Tensor Networks vs. Matrix Product States: Enhanced Efficiency in High-Dimensional Spaces [0.0]
We show that a comb-shaped tensor network architecture can yield more efficient contractions than a standard MPS. This finding suggests that for continuous and high-dimensional data distributions, transitioning from MPS to a comb tensor network representation can substantially reduce computational overhead while maintaining accuracy.
arXiv Detail & Related papers (2024-12-08T20:28:49Z)
Tensor Polynomial Additive Model [40.30621617188693]
The TPAM preserves the inherent interpretability of additive models, transparent decision-making and the extraction of meaningful feature values. It can enhance accuracy by up to 30%, and compression rate by up to 5 times, while maintaining a good interpretability.
arXiv Detail & Related papers (2024-06-05T06:23:11Z)
The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets [2.07180164747172]
We study universal traits which emerge both in real-world complex datasets, as well as in artificially generated ones. Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure.
arXiv Detail & Related papers (2023-06-26T18:01:47Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
Multi-Metric AutoRec for High Dimensional and Sparse User Behavior Data Prediction [10.351592131677018]
We propose a multi-metric AutoRec (MMA) based on the representative AutoRec. MMA enjoys the multi-metric orientation from a set of dispersed metric spaces, achieving a comprehensive representation of user data. MMA can outperform seven other state-of-the-art models in predicting unobserved user behavior data.
arXiv Detail & Related papers (2022-12-20T12:28:07Z)
Towards a mathematical understanding of learning from few examples with nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points. We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z)
RENs: Relevance Encoding Networks [0.0]
This paper proposes relevance encoding networks (RENs): a novel probabilistic VAE-based framework that uses the automatic relevance determination (ARD) prior in the latent space to learn the data-specific bottleneck dimensionality. We show that the proposed model learns the relevant latent bottleneck dimensionality without compromising the representation and generation quality of the samples.
arXiv Detail & Related papers (2022-05-25T21:53:48Z)
Learning from few examples with nonlinear feature maps [68.8204255655161]
We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities.
arXiv Detail & Related papers (2022-03-31T10:36:50Z)
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.