Learning with tree tensor networks: complexity estimates and model
selection
- URL: http://arxiv.org/abs/2007.01165v3
- Date: Wed, 19 May 2021 10:15:30 GMT
- Title: Learning with tree tensor networks: complexity estimates and model
selection
- Authors: Bertrand Michel and Anthony Nouy
- Abstract summary: We analyze a complexity-based model selection method for tree tensor networks in an empirical risk minimization framework.
We show that our strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces.
- Score: 33.10033799230797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tree tensor networks, or tree-based tensor formats, are prominent model
classes for the approximation of high-dimensional functions in computational
and data science. They correspond to sum-product neural networks with a sparse
connectivity associated with a dimension tree and widths given by a tuple of
tensor ranks. The approximation power of these models has been proved to be
(near to) optimal for classical smoothness classes. However, in an empirical
risk minimization framework with a limited number of observations, the
dimension tree and ranks should be selected carefully to balance estimation and
approximation errors. We propose and analyze a complexity-based model selection
method for tree tensor networks in an empirical risk minimization framework and
we analyze its performance over a wide range of smoothness classes. Given a
family of model classes associated with different trees, ranks, tensor product
feature spaces and sparsity patterns for sparse tensor networks, a model is
selected (\`a la Barron, Birg\'e, Massart) by minimizing a penalized empirical
risk, with a penalty depending on the complexity of the model class and derived
from estimates of the metric entropy of tree tensor networks. This choice of
penalty yields a risk bound for the selected predictor. In a least-squares
setting, after deriving fast rates of convergence of the risk, we show that our
strategy is (near to) minimax adaptive to a wide range of smoothness classes
including Sobolev or Besov spaces (with isotropic, anisotropic or mixed
dominating smoothness) and analytic functions. We discuss the role of sparsity
of the tensor network for obtaining optimal performance in several regimes. In
practice, the amplitude of the penalty is calibrated with a slope heuristics
method. Numerical experiments in a least-squares regression setting illustrate
the performance of the strategy.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features.
An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - Sparsely constrained neural networks for model discovery of PDEs [0.0]
We present a modular framework that determines the sparsity pattern of a deep-learning based surrogate using any sparse regression technique.
We show how a different network architecture and sparsity estimator improve model discovery accuracy and convergence on several benchmark examples.
arXiv Detail & Related papers (2020-11-09T11:02:40Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Approximation with Tensor Networks. Part II: Approximation Rates for
Smoothness Classes [0.0]
We study the approximation by tensor networks (TNs) of functions from smoothness classes.
The resulting tool can be interpreted as a feed-forward neural network.
We show that arbitrary Besov functions can be approximated with optimal or near to optimal rate.
arXiv Detail & Related papers (2020-06-30T21:57:42Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.