THG: Transformer with Hyperbolic Geometry
- URL: http://arxiv.org/abs/2106.07350v1
- Date: Tue, 1 Jun 2021 14:09:33 GMT
- Title: THG: Transformer with Hyperbolic Geometry
- Authors: Zhe Liu and Yibin Xu
- Abstract summary: "X-former" models make changes only around the quadratic time and memory complexity of self-attention.
We propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space.
- Score: 8.895324519034057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer model architectures have become an indispensable staple in deep
learning lately for their effectiveness across a range of tasks. Recently, a
surge of "X-former" models have been proposed which improve upon the original
Transformer architecture. However, most of these variants make changes only
around the quadratic time and memory complexity of self-attention, i.e. the dot
product between the query and the key. What's more, they are calculate solely
in Euclidean space. In this work, we propose a novel Transformer with
Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean
space and Hyperbolic space. THG makes improvements in linear transformations of
self-attention, which are applied on the input sequence to get the query and
the key, with the proposed hyperbolic linear. Extensive experiments on sequence
labeling task, machine reading comprehension task and classification task
demonstrate the effectiveness and generalizability of our model. It also
demonstrates THG could alleviate overfitting.
Related papers
- Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space [47.4014545166959]
We introduce Hypformer, a novel hyperbolic Transformer based on the Lorentz model of hyperbolic geometry.
We develop a linear self-attention mechanism in hyperbolic space, enabling hyperbolic Transformer to process billion-scale graph data and long-sequence inputs for the first time.
arXiv Detail & Related papers (2024-07-01T13:44:38Z) - EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention [88.45459681677369]
We propose a novel transformer variant with complex vector attention, named EulerFormer.
It provides a unified theoretical framework to formulate both semantic difference and positional difference.
It is more robust to semantic variations and possesses moresuperior theoretical properties in principle.
arXiv Detail & Related papers (2024-03-26T14:18:43Z) - Do Efficient Transformers Really Save Computation? [32.919672616480135]
We focus on the capabilities and limitations of efficient Transformers, specifically the Sparse Transformer and the Linear Transformer.
Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size.
We identify a class of DP problems for which these models can be more efficient than the standard Transformer.
arXiv Detail & Related papers (2024-02-21T17:00:56Z) - Hiformer: Heterogeneous Feature Interactions Learning with Transformers
for Recommender Systems [27.781785405875084]
We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions.
We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems.
arXiv Detail & Related papers (2023-11-10T05:57:57Z) - Sliceformer: Make Multi-head Attention as Simple as Sorting in
Discriminative Tasks [32.33355192614434]
We propose an effective and efficient surrogate of the Transformer, called Sliceformer.
Our Sliceformer replaces the classic MHA mechanism with an extremely simple slicing-sorting'' operation.
Our Sliceformer achieves comparable or better performance with lower memory cost and faster speed than the Transformer and its variants.
arXiv Detail & Related papers (2023-10-26T14:43:07Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.