Residual Tensor Train: a Flexible and Efficient Approach for Learning
Multiple Multilinear Correlations
- URL: http://arxiv.org/abs/2108.08659v1
- Date: Thu, 19 Aug 2021 12:47:16 GMT
- Title: Residual Tensor Train: a Flexible and Efficient Approach for Learning
Multiple Multilinear Correlations
- Authors: Yiwei Chen, Yu Pan, Daoyi Dong
- Abstract summary: In this paper, we present a novel Residual Train (ResTT) which integrates the merits of TT and residual structure.
In particular, we prove that the fully-connected layer in neural networks and the Volterra series can be taken as special cases of ResTT.
We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem.
- Score: 4.754987078078158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tensor Train (TT) approach has been successfully applied in the modelling of
the multilinear interaction of features. Nevertheless, the existing models lack
flexibility and generalizability, as they only model a single type of
high-order correlation. In practice, multiple multilinear correlations may
exist within the features. In this paper, we present a novel Residual Tensor
Train (ResTT) which integrates the merits of TT and residual structure to
capture the multilinear feature correlations, from low to higher orders, within
the same model. In particular, we prove that the fully-connected layer in
neural networks and the Volterra series can be taken as special cases of ResTT.
Furthermore, we derive the rule for weight initialization that stabilizes the
training of ResTT based on a mean-field analysis. We prove that such a rule is
much more relaxed than that of TT, which means ResTT can easily address the
vanishing and exploding gradient problem that exists in the current TT models.
Numerical experiments demonstrate that ResTT outperforms the state-of-the-art
tensor network approaches, and is competitive with the benchmark deep learning
models on MNIST and Fashion-MNIST datasets.
Related papers
- A Momentum-Incorporated Non-Negative Latent Factorization of Tensors
Model for Dynamic Network Representation [0.0]
A large-scale dynamic network (LDN) is a source of data in many big data-related applications.
A Latent factorization of tensors (LFT) model efficiently extracts this time pattern.
LFT models based on gradient descent (SGD) solvers are often limited by training schemes and have poor tail convergence.
This paper proposes a novel nonlinear LFT model (MNNL) based on momentum-ind SGD to make training unconstrained and compatible with general training schemes.
arXiv Detail & Related papers (2023-05-04T12:30:53Z) - ES-dRNN: A Hybrid Exponential Smoothing and Dilated Recurrent Neural
Network Model for Short-Term Load Forecasting [1.4502611532302039]
Short-term load forecasting (STLF) is challenging due to complex time series (TS)
This paper proposes a novel hybrid hierarchical deep learning model that deals with multiple seasonality.
It combines exponential smoothing (ES) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-12-05T19:38:42Z) - Multi-Tensor Network Representation for High-Order Tensor Completion [25.759851542474447]
This work studies the problem of high-dimensional data (referred to tensors) completion from partially observed samplings.
We consider that a tensor is a superposition of multiple low-rank components.
In this paper, we propose a fundamental tensor decomposition framework: Multi-Tensor Network decomposition (MTNR)
arXiv Detail & Related papers (2021-09-09T03:50:19Z) - MLCTR: A Fast Scalable Coupled Tensor Completion Based on Multi-Layer
Non-Linear Matrix Factorization [3.6978630614152013]
This paper focuses on the embedding learning aspect of the tensor completion problem and proposes a new multi-layer neural network architecture for factorization and completion (MLCTR)
The network architecture entails multiple advantages: a series of low-rank matrix factorizations building blocks to minimize overfitting, interleaved transfer functions in each layer for non-linearity, and by-pass connections to reduce diminishing problem and increase depths of networks.
Our algorithm is highly efficient for imputing missing values in the EPS data.
arXiv Detail & Related papers (2021-09-04T03:08:34Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - On the Memory Mechanism of Tensor-Power Recurrent Models [25.83531612758211]
We investigate the memory mechanism of TP recurrent models.
We show that a large degree p is an essential condition to achieve the long memory effect.
New model is expected to benefit from the long memory effect in a stable manner.
arXiv Detail & Related papers (2021-03-02T07:07:47Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.