Initialization Matters: Regularizing Manifold-informed Initialization
for Neural Recommendation Systems
- URL: http://arxiv.org/abs/2106.04993v1
- Date: Wed, 9 Jun 2021 11:26:18 GMT
- Title: Initialization Matters: Regularizing Manifold-informed Initialization
for Neural Recommendation Systems
- Authors: Yinan Zhang, Boyang Li, Yong Liu, Hao Wang, Chunyan Miao
- Abstract summary: We propose a new scheme for user embeddings called Laplacian Eigenmaps with Popularity-based Regularization for Isolated Data (LEPORID)
LEPORID endows the embeddings with information regarding multi-scale neighborhood structures on the data manifold and performs adaptive regularization to compensate for high embedding variance on the tail of the data distribution.
We show that existing neural systems with LEPORID often perform on par or better than KNN.
- Score: 47.49065927541129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proper initialization is crucial to the optimization and the generalization
of neural networks. However, most existing neural recommendation systems
initialize the user and item embeddings randomly. In this work, we propose a
new initialization scheme for user and item embeddings called Laplacian
Eigenmaps with Popularity-based Regularization for Isolated Data (LEPORID).
LEPORID endows the embeddings with information regarding multi-scale
neighborhood structures on the data manifold and performs adaptive
regularization to compensate for high embedding variance on the tail of the
data distribution. Exploiting matrix sparsity, LEPORID embeddings can be
computed efficiently. We evaluate LEPORID in a wide range of neural
recommendation models. In contrast to the recent surprising finding that the
simple K-nearest-neighbor (KNN) method often outperforms neural recommendation
systems, we show that existing neural systems initialized with LEPORID often
perform on par or better than KNN. To maximize the effects of the
initialization, we propose the Dual-Loss Residual Recommendation (DLR2)
network, which, when initialized with LEPORID, substantially outperforms both
traditional and state-of-the-art neural recommender systems.
Related papers
- Linear-Time Graph Neural Networks for Scalable Recommendations [50.45612795600707]
The key of recommender systems is to forecast users' future behaviors based on previous user-item interactions.
Recent years have witnessed a rising interest in leveraging Graph Neural Networks (GNNs) to boost the prediction performance of recommender systems.
We propose a Linear-Time Graph Neural Network (LTGNN) to scale up GNN-based recommender systems to achieve comparable scalability as classic MF approaches.
arXiv Detail & Related papers (2024-02-21T17:58:10Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Neuron Campaign for Initialization Guided by Information Bottleneck
Theory [31.44355490646638]
Initialization plays a critical role in the training of deep neural networks (DNN)
We use the Information Bottleneck (IB) theory to provide an explanation about the generalization of DNN.
Experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.
arXiv Detail & Related papers (2021-08-14T13:19:43Z) - A novel Deep Neural Network architecture for non-linear system
identification [78.69776924618505]
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification.
Inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function)
This architecture allows for automatic complexity selection based solely on available data.
arXiv Detail & Related papers (2021-06-06T10:06:07Z) - Multi-Sample Online Learning for Spiking Neural Networks based on
Generalized Expectation Maximization [42.125394498649015]
Spiking Neural Networks (SNNs) capture some of the efficiency of biological brains by processing through binary neural dynamic activations.
This paper proposes to leverage multiple compartments that sample independent spiking signals while sharing synaptic weights.
The key idea is to use these signals to obtain more accurate statistical estimates of the log-likelihood training criterion, as well as of its gradient.
arXiv Detail & Related papers (2021-02-05T16:39:42Z) - Neural Representations in Hybrid Recommender Systems: Prediction versus
Regularization [8.384351067134999]
We define the neural representation for prediction (NRP) framework and apply it to the autoencoder-based recommendation systems.
We also apply the NRP framework to a direct neural network structure which predicts the ratings without reconstructing the user and item information.
The results confirm that neural representations are better for prediction than regularization and show that the NRP framework, combined with the direct neural network structure, outperforms the state-of-the-art methods in the prediction task.
arXiv Detail & Related papers (2020-10-12T23:12:49Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Persistent Neurons [4.061135251278187]
We propose a trajectory-based strategy that optimize the learning task using information from previous solutions.
Persistent neurons can be regarded as a method with gradient informed bias where individual updates are corrupted by deterministic error terms.
We evaluate the full and partial persistent model and show it can be used to boost the performance on a range of NN structures.
arXiv Detail & Related papers (2020-07-02T22:36:49Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.