Related papers: Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

URL: http://arxiv.org/abs/2309.05436v3
Date: Tue, 12 Mar 2024 10:18:09 GMT
Title: Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models
Authors: Frederiek Wesel, Kim Batselier
Abstract summary: We exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts.
Score: 9.18287948559108
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.

Related papers

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data. In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z)
Theory on variational high-dimensional tensor networks [2.0307382542339485]
We investigate the emergent statistical properties of random high-dimensional-network states and the trainability of tensoral networks. We prove that variational high-dimensional networks suffer from barren plateaus for global loss functions. Our results pave a way for their future theoretical studies and practical applications.
arXiv Detail & Related papers (2023-03-30T15:26:30Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
Low-Rank Tensor Function Representation for Multi-Dimensional Data Recovery [52.21846313876592]
Low-rank tensor function representation (LRTFR) can continuously represent data beyond meshgrid with infinite resolution. We develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization. Our method substantiates the superiority and versatility of our method as compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-01T04:00:38Z)
Interaction Decompositions for Tensor Network Regression [0.0]
We show how to assess the relative importance of different regressors as a function of their degree. We introduce a new type of tensor network model that is explicitly trained on only a small subset of interaction degrees. This suggests that standard tensor network models utilize their regressors in an inefficient manner, with the lower degree terms vastly underutilized.
arXiv Detail & Related papers (2022-08-11T20:17:27Z)
Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers. We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z)
Large-Scale Learning with Fourier Features and Tensor Decompositions [3.6930948691311007]
We exploit the tensor product structure of deterministic Fourier features, which enables us to represent the model parameters as a low-rank tensor decomposition. We demonstrate by means of numerical experiments how our low-rank tensor approach obtains the same performance of the corresponding nonparametric model.
arXiv Detail & Related papers (2021-09-03T14:12:53Z)
Low-Rank and Sparse Enhanced Tucker Decomposition for Tensor Completion [3.498620439731324]
We introduce a unified low-rank and sparse enhanced Tucker decomposition model for tensor completion. Our model possesses a sparse regularization term to promote a sparse core tensor, which is beneficial for tensor data compression. It is remarkable that our model is able to deal with different types of real-world data sets, since it exploits the potential periodicity and inherent correlation properties appeared in tensors.
arXiv Detail & Related papers (2020-10-01T12:45:39Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.