Spherical Position Encoding for Transformers
- URL: http://arxiv.org/abs/2310.04454v1
- Date: Wed, 4 Oct 2023 09:28:59 GMT
- Title: Spherical Position Encoding for Transformers
- Authors: Eren Unlu
- Abstract summary: We introduce the notion of "geotokens" which are input elements for transformer architectures.
Unlike the natural language the sequential position is not important for the model but the geographical coordinates are.
We formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Position encoding is the primary mechanism which induces notion of sequential
order for input tokens in transformer architectures. Even though this
formulation in the original transformer paper has yielded plausible performance
for general purpose language understanding and generation, several new
frameworks such as Rotary Position Embedding (RoPE) are proposed for further
enhancement. In this paper, we introduce the notion of "geotokens" which are
input elements for transformer architectures, each representing an information
related to a geological location. Unlike the natural language the sequential
position is not important for the model but the geographical coordinates are.
In order to induce the concept of relative position for such a setting and
maintain the proportion between the physical distance and distance on embedding
space, we formulate a position encoding mechanism based on RoPE architecture
which is adjusted for spherical coordinates.
Related papers
- Improving Transformers using Faithful Positional Encoding [55.30212768657544]
We propose a new positional encoding method for a neural network architecture called the Transformer.
Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
arXiv Detail & Related papers (2024-05-15T03:17:30Z) - Geotokens and Geotransformers [0.0]
This paper presents geotokens, input components for transformers, each linked to a specific geological location.
Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves.
arXiv Detail & Related papers (2024-03-23T22:02:56Z) - LGFCTR: Local and Global Feature Convolutional Transformer for Image
Matching [8.503217766507584]
A novel convolutional transformer is proposed to capture both local contexts and global structures.
A universal FPN-like framework captures global structures in self-encoder as well as cross-decoder by transformers.
A novel regression-based sub-pixel refinement module exploits the whole fine-grained window features for fine-level positional deviation regression.
arXiv Detail & Related papers (2023-11-29T12:06:19Z) - GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers [63.41460219156508]
We argue that existing positional encoding schemes are suboptimal for 3D vision tasks.
We propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation.
We show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models.
arXiv Detail & Related papers (2023-10-16T13:16:09Z) - Linearized Relative Positional Encoding [43.898057545832366]
Relative positional encoding is widely used in vanilla and linear transformers to represent positional information.
We put together a variety of existing linear relative positional encoding approaches under a canonical form.
We further propose a family of linear relative positional encoding algorithms via unitary transformation.
arXiv Detail & Related papers (2023-07-18T13:56:43Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - Learnable Fourier Features for Multi-DimensionalSpatial Positional
Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features.
Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z) - RoFormer: Enhanced Transformer with Rotary Position Embedding [9.01819510933327]
We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information.
RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation.
We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
arXiv Detail & Related papers (2021-04-20T09:54:06Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.