Beyond Fully-Connected Layers with Quaternions: Parameterization of
Hypercomplex Multiplications with $1/n$ Parameters
- URL: http://arxiv.org/abs/2102.08597v1
- Date: Wed, 17 Feb 2021 06:16:58 GMT
- Title: Beyond Fully-Connected Layers with Quaternions: Parameterization of
Hypercomplex Multiplications with $1/n$ Parameters
- Authors: Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu, Siu Cheung
Hui, Jie Fu
- Abstract summary: We propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined.
Our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space.
- Score: 71.09633069060342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have demonstrated reasonable success of representation learning
in hypercomplex space. Specifically, "fully-connected layers with Quaternions"
(4D hypercomplex numbers), which replace real-valued matrix multiplications in
fully-connected layers with Hamilton products of Quaternions, both enjoy
parameter savings with only 1/4 learnable parameters and achieve comparable
performance in various applications. However, one key caveat is that
hypercomplex space only exists at very few predefined dimensions (4D, 8D, and
16D). This restricts the flexibility of models that leverage hypercomplex
multiplications. To this end, we propose parameterizing hypercomplex
multiplications, allowing models to learn multiplication rules from data
regardless of whether such rules are predefined. As a result, our method not
only subsumes the Hamilton product, but also learns to operate on any arbitrary
nD hypercomplex space, providing more architectural flexibility using
arbitrarily $1/n$ learnable parameters compared with the fully-connected layer
counterpart. Experiments of applications to the LSTM and Transformer models on
natural language inference, machine translation, text style transfer, and
subject verb agreement demonstrate architectural flexibility and effectiveness
of the proposed approach.
Related papers
- Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering [5.016335384639901]
Multi-modal input of Audio-Visual Question Answering (AVQA) makes feature extraction and fusion processes more challenging.
We propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models.
Our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.
arXiv Detail & Related papers (2024-06-14T08:43:31Z) - A Hyper-Transformer model for Controllable Pareto Front Learning with
Split Feasibility Constraints [2.07180164747172]
We develop a hyper-transformer (Hyper-Trans) model for CPFL with SFC.
We show that the Hyper-Trans model makes MED errors smaller in computational experiments than the Hyper-MLP model.
arXiv Detail & Related papers (2024-02-04T10:21:03Z) - Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios.
We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer.
Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z) - Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs.
Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space.
LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z) - Scalable One-Pass Optimisation of High-Dimensional Weight-Update
Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser.
It requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z) - Lightweight Convolutional Neural Networks By Hypercomplex
Parameterization [10.420215908252425]
We define the parameterization of hypercomplex convolutional layers to develop lightweight and efficient large-scale convolutional models.
Our method grasps the convolution rules and the filters organization directly from data.
We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets and audio datasets.
arXiv Detail & Related papers (2021-10-08T14:57:19Z) - HyperNP: Interactive Visual Exploration of Multidimensional Projection
Hyperparameters [61.354362652006834]
HyperNP is a scalable method that allows for real-time interactive exploration of projection methods by training neural network approximations.
We evaluate the performance of the HyperNP across three datasets in terms of performance and speed.
arXiv Detail & Related papers (2021-06-25T17:28:14Z) - Quaternion Factorization Machines: A Lightweight Solution to Intricate
Feature Interaction Modelling [76.89779231460193]
factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering.
We propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM) for sparse predictive analytics.
arXiv Detail & Related papers (2021-04-05T00:02:36Z) - A General Framework for Hypercomplex-valued Extreme Learning Machines [2.055949720959582]
This paper aims to establish a framework for extreme learning machines (ELMs) on general hypercomplex algebras.
We show a framework to operate in these algebras through real-valued linear algebra operations.
Experiments highlight the excellent performance of hypercomplex-valued ELMs to treat high-dimensional data.
arXiv Detail & Related papers (2021-01-15T15:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.