Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space
- URL: http://arxiv.org/abs/2407.01290v1
- Date: Mon, 1 Jul 2024 13:44:38 GMT
- Title: Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space
- Authors: Menglin Yang, Harshit Verma, Delvin Ce Zhang, Jiahong Liu, Irwin King, Rex Ying,
- Abstract summary: We introduce Hypformer, a novel hyperbolic Transformer based on the Lorentz model of hyperbolic geometry.
We develop a linear self-attention mechanism in hyperbolic space, enabling hyperbolic Transformer to process billion-scale graph data and long-sequence inputs for the first time.
- Score: 47.4014545166959
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Hyperbolic geometry have shown significant potential in modeling complex structured data, particularly those with underlying tree-like and hierarchical structures. Despite the impressive performance of various hyperbolic neural networks across numerous domains, research on adapting the Transformer to hyperbolic space remains limited. Previous attempts have mainly focused on modifying self-attention modules in the Transformer. However, these efforts have fallen short of developing a complete hyperbolic Transformer. This stems primarily from: (i) the absence of well-defined modules in hyperbolic space, including linear transformation layers, LayerNorm layers, activation functions, dropout operations, etc. (ii) the quadratic time complexity of the existing hyperbolic self-attention module w.r.t the number of input tokens, which hinders its scalability. To address these challenges, we propose, Hypformer, a novel hyperbolic Transformer based on the Lorentz model of hyperbolic geometry. In Hypformer, we introduce two foundational blocks that define the essential modules of the Transformer in hyperbolic space. Furthermore, we develop a linear self-attention mechanism in hyperbolic space, enabling hyperbolic Transformer to process billion-scale graph data and long-sequence inputs for the first time. Our experimental results confirm the effectiveness and efficiency of Hypformer across various datasets, demonstrating its potential as an effective and scalable solution for large-scale data representation and large models.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Hyper-Transformer for Amodal Completion [82.4118011026855]
Amodal object completion is a complex task that involves predicting the invisible parts of an object based on visible segments and background information.
We introduce a novel framework named the Hyper-Transformer Amodal Network (H-TAN)
This framework utilizes a hyper transformer equipped with a dynamic convolution head to directly learn shape priors and accurately predict amodal masks.
arXiv Detail & Related papers (2024-05-30T11:11:54Z) - Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory [11.3128832831327]
Increasing the size of a Transformer model does not always lead to enhanced performance.
improved generalization ability occurs as the model memorizes the training samples.
We present a theoretical framework that sheds light on the memorization process and performance dynamics of transformer-based language models.
arXiv Detail & Related papers (2024-05-14T15:48:36Z) - Accelerating hyperbolic t-SNE [7.411478341945197]
This paper introduces the first acceleration structure for hyperbolic embeddings, building upon a polar quadtree.
We demonstrate that it computes embeddings of similar quality in significantly less time.
arXiv Detail & Related papers (2024-01-23T12:59:40Z) - Curve Your Attention: Mixed-Curvature Transformers for Graph
Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces.
We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier
Transform [29.205221688430733]
The choice of geometric space for knowledge graph (KG) embeddings can have significant effects on the performance of KG completion tasks.
Recent explorations of the complex hyperbolic geometry further improved the hyperbolic embeddings for capturing a variety of hierarchical structures.
This paper aims to utilize the representation capacity of the complex hyperbolic geometry in multi-relational KG embeddings.
arXiv Detail & Related papers (2022-11-07T15:46:00Z) - THG: Transformer with Hyperbolic Geometry [8.895324519034057]
"X-former" models make changes only around the quadratic time and memory complexity of self-attention.
We propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space.
arXiv Detail & Related papers (2021-06-01T14:09:33Z) - Fully Hyperbolic Neural Networks [63.22521652077353]
We propose a fully hyperbolic framework to build hyperbolic networks based on the Lorentz model.
We show that our method has better performance for building both shallow and deep networks.
arXiv Detail & Related papers (2021-05-31T03:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.