Quaternion Factorization Machines: A Lightweight Solution to Intricate
Feature Interaction Modelling
- URL: http://arxiv.org/abs/2104.01716v1
- Date: Mon, 5 Apr 2021 00:02:36 GMT
- Title: Quaternion Factorization Machines: A Lightweight Solution to Intricate
Feature Interaction Modelling
- Authors: Tong Chen, Hongzhi Yin, Xiangliang Zhang, Zi Huang, Yang Wang, Meng
Wang
- Abstract summary: factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering.
We propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM) for sparse predictive analytics.
- Score: 76.89779231460193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a well-established approach, factorization machine (FM) is capable of
automatically learning high-order interactions among features to make
predictions without the need for manual feature engineering. With the prominent
development of deep neural networks (DNNs), there is a recent and ongoing trend
of enhancing the expressiveness of FM-based models with DNNs. However, though
better results are obtained with DNN-based FM variants, such performance gain
is paid off by an enormous amount (usually millions) of excessive model
parameters on top of the plain FM. Consequently, the heavy parameterization
impedes the real-life practicality of those deep models, especially efficient
deployment on resource-constrained IoT and edge devices. In this paper, we move
beyond the traditional real space where most deep FM-based models are defined,
and seek solutions from quaternion representations within the hypercomplex
space. Specifically, we propose the quaternion factorization machine (QFM) and
quaternion neural factorization machine (QNFM), which are two novel lightweight
and memory-efficient quaternion-valued models for sparse predictive analytics.
By introducing a brand new take on FM-based models with the notion of
quaternion algebra, our models not only enable expressive inter-component
feature interactions, but also significantly reduce the parameter size due to
lower degrees of freedom in the hypercomplex Hamilton product compared with
real-valued matrix multiplication. Extensive experimental results on three
large-scale datasets demonstrate that QFM achieves 4.36% performance
improvement over the plain FM without introducing any extra parameters, while
QNFM outperforms all baselines with up to two magnitudes' parameter size
reduction in comparison to state-of-the-art peer methods.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Temperature Distribution Prediction in Laser Powder Bed Fusion using Transferable and Scalable Graph Neural Networks [0.0]
This study presents novel predictive models using Graph Neural Networks (GNNs) for simulating thermal dynamics in Laser Powder Bed Fusion processes.
The proposed models capture the complexity of the heat transfer process in L-PBF while significantly reducing computational costs.
arXiv Detail & Related papers (2024-07-18T18:14:47Z) - Neural Quantum State Study of Fracton Models [3.8068573698649826]
Fracton models host unconventional topological orders in three and higher dimensions.
We establish neural quantum states (NQS) as new tools to study phase transitions in these models.
Our work demonstrates the remarkable potential of NQS in studying complicated three-dimensional problems.
arXiv Detail & Related papers (2024-06-17T15:58:09Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Boosting Factorization Machines via Saliency-Guided Mixup [125.15872106335692]
We present MixFM, inspired by Mixup, to generate auxiliary training data to boost Factorization machines (FMs)
We also put forward a novel Factorization Machine powered by Saliency-guided Mixup (denoted as SMFM)
arXiv Detail & Related papers (2022-06-17T09:49:00Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Neural Closure Models for Dynamical Systems [35.000303827255024]
We develop a novel methodology to learn non-Markovian closure parameterizations for low-fidelity models.
New "neural closure models" augment low-fidelity models with neural delay differential equations (nDDEs)
We show that using non-Markovian over Markovian closures improves long-term accuracy and requires smaller networks.
arXiv Detail & Related papers (2020-12-27T05:55:33Z) - Neural network with data augmentation in multi-objective prediction of
multi-stage pump [16.038015881697593]
neural network model (NN) is built in comparison with the quadratic response surface model (RSF), the radial basis Gaussian response surface model (RBF), and the Kriging model (KRG)
The accuracy of the head and power based on the four predictions models are analyzed comparing with the CFD simulation values.
A neural network model based on data augmentation (NNDA) is proposed for the reason that simulation cost is too high and data is scarce in mechanical simulation field.
arXiv Detail & Related papers (2020-02-04T11:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.