Implicit Regularization in Deep Tensor Factorization
- URL: http://arxiv.org/abs/2105.01346v1
- Date: Tue, 4 May 2021 07:48:40 GMT
- Title: Implicit Regularization in Deep Tensor Factorization
- Authors: Paolo Milanesi (QARMA), Hachem Kadri (LIS, QARMA, AMU SCI), St\'ephane
Ayache (QARMA), Thierry Arti\`eres (QARMA)
- Abstract summary: We introduce deep Tucker and TT unconstrained factorization to deal with the completion task.
Experiments on both synthetic and real data show that gradient descent promotes solution with low-rank.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attempts of studying implicit regularization associated to gradient descent
(GD) have identified matrix completion as a suitable test-bed. Late findings
suggest that this phenomenon cannot be phrased as a minimization-norm problem,
implying that a paradigm shift is required and that dynamics has to be taken
into account. In the present work we address the more general setup of tensor
completion by leveraging two popularized tensor factorization, namely Tucker
and TensorTrain (TT). We track relevant quantities such as tensor nuclear norm,
effective rank, generalized singular values and we introduce deep Tucker and TT
unconstrained factorization to deal with the completion task. Experiments on
both synthetic and real data show that gradient descent promotes solution with
low-rank, and validate the conjecture saying that the phenomenon has to be
addressed from a dynamical perspective.
Related papers
- Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent [4.031100721019478]
We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime.
We prove the first tensor result of its kind for gradient descent rather than gradient flow.
arXiv Detail & Related papers (2024-10-21T17:52:01Z) - Provable Tensor Completion with Graph Information [49.08648842312456]
We introduce a novel model, theory, and algorithm for solving the dynamic graph regularized tensor completion problem.
We develop a comprehensive model simultaneously capturing the low-rank and similarity structure of the tensor.
In terms of theory, we showcase the alignment between the proposed graph smoothness regularization and a weighted tensor nuclear norm.
arXiv Detail & Related papers (2023-10-04T02:55:10Z) - Towards Training Without Depth Limits: Batch Normalization Without
Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth.
We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth.
We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z) - Model based Multi-agent Reinforcement Learning with Tensor
Decompositions [52.575433758866936]
This paper investigates generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank.
Experiments on synthetic MDPs show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.
arXiv Detail & Related papers (2021-10-27T15:36:25Z) - Implicit Regularization in Tensor Factorization [17.424619189180675]
Implicit regularization in deep learning is perceived as a tendency of gradient-based optimization to fit training data with predictors of minimal "complexity"
We argue that tensor rank may pave way to explaining both implicit regularization of neural networks, and the properties of real-world data translating it to generalization.
arXiv Detail & Related papers (2021-02-19T15:10:26Z) - Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear
Classification [3.158346511479111]
We propose a class of STEs with certain monotonicity, and consider their applications to the training of a two-linear-layer network with quantized activation functions.
We establish performance guarantees for the proposed STEs by showing that the corresponding coarse gradient methods converge to the global minimum.
arXiv Detail & Related papers (2020-11-23T07:50:09Z) - Low-Rank and Sparse Enhanced Tucker Decomposition for Tensor Completion [3.498620439731324]
We introduce a unified low-rank and sparse enhanced Tucker decomposition model for tensor completion.
Our model possesses a sparse regularization term to promote a sparse core tensor, which is beneficial for tensor data compression.
It is remarkable that our model is able to deal with different types of real-world data sets, since it exploits the potential periodicity and inherent correlation properties appeared in tensors.
arXiv Detail & Related papers (2020-10-01T12:45:39Z) - Uncertainty quantification for nonconvex tensor completion: Confidence
intervals, heteroscedasticity and optimality [92.35257908210316]
We study the problem of estimating a low-rank tensor given incomplete and corrupted observations.
We find that it attains unimprovable rates $ell-2$ accuracy.
arXiv Detail & Related papers (2020-06-15T17:47:13Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Tensor denoising and completion based on ordinal observations [11.193504036335503]
We consider the problem of low-rank tensor estimation from possibly incomplete, ordinal-valued observations.
We propose a multi-linear cumulative link model, develop a rank-constrained M-estimator, and obtain theoretical accuracy guarantees.
We show that the proposed estimator is minimax optimal under the class of low-rank models.
arXiv Detail & Related papers (2020-02-16T07:09:56Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.