Geotokens and Geotransformers
- URL: http://arxiv.org/abs/2403.15940v1
- Date: Sat, 23 Mar 2024 22:02:56 GMT
- Title: Geotokens and Geotransformers
- Authors: Eren Unlu,
- Abstract summary: This paper presents geotokens, input components for transformers, each linked to a specific geological location.
Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In transformer architectures, position encoding primarily provides a sense of sequence for input tokens. While the original transformer paper's method has shown satisfactory results in general language processing tasks, there have been new proposals, such as Rotary Position Embedding (RoPE), for further improvement. This paper presents geotokens, input components for transformers, each linked to a specific geological location. Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves. To represent the relative position in this context and to keep a balance between the real world distance and the distance in the embedding space, we design a position encoding approach drawing from the RoPE structure but tailored for spherical coordinates.
Related papers
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform [62.27337227010514]
We introduce a novel self-supervised Rotation-Invariant 3D correspondence learner with Local Shape Transform, dubbed RIST.
RIST learns to establish dense correspondences between shapes even under challenging intra-class variations and arbitrary orientations.
RIST demonstrates state-of-the-art performances on 3D part label transfer and semantic keypoint transfer given arbitrarily rotated point cloud pairs.
arXiv Detail & Related papers (2024-04-17T08:09:25Z) - LGFCTR: Local and Global Feature Convolutional Transformer for Image
Matching [8.503217766507584]
A novel convolutional transformer is proposed to capture both local contexts and global structures.
A universal FPN-like framework captures global structures in self-encoder as well as cross-decoder by transformers.
A novel regression-based sub-pixel refinement module exploits the whole fine-grained window features for fine-level positional deviation regression.
arXiv Detail & Related papers (2023-11-29T12:06:19Z) - GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers [63.41460219156508]
We argue that existing positional encoding schemes are suboptimal for 3D vision tasks.
We propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation.
We show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models.
arXiv Detail & Related papers (2023-10-16T13:16:09Z) - Spherical Position Encoding for Transformers [0.0]
We introduce the notion of "geotokens" which are input elements for transformer architectures.
Unlike the natural language the sequential position is not important for the model but the geographical coordinates are.
We formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.
arXiv Detail & Related papers (2023-10-04T09:28:59Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark
Detection [131.1478251760399]
We formulate the facial landmark detection task as refining landmark queries along pyramid memories.
Specifically, a pyramid transformer head (PTH) is introduced to build both relations among landmarks and heterologous relations between landmarks and cross-scale contexts.
A dynamic landmark refinement (DLR) module is designed to decompose the landmark regression into an end-to-end refinement procedure.
arXiv Detail & Related papers (2022-07-08T14:12:26Z) - Dynamic Position Encoding for Transformers [18.315954297959617]
Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years.
Transformers could fail to properly encode sequential/positional information due to their non-recurrent nature.
We propose a novel architecture with new position embeddings depending on the input text to address this shortcoming.
arXiv Detail & Related papers (2022-04-18T03:08:48Z) - RoFormer: Enhanced Transformer with Rotary Position Embedding [9.01819510933327]
We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information.
RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation.
We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
arXiv Detail & Related papers (2021-04-20T09:54:06Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.