GridPE: Unifying Positional Encoding in Transformers with a Grid Cell-Inspired Framework
- URL: http://arxiv.org/abs/2406.07049v2
- Date: Sat, 14 Sep 2024 11:35:50 GMT
- Title: GridPE: Unifying Positional Encoding in Transformers with a Grid Cell-Inspired Framework
- Authors: Boyang Li, Yulin Wu, Nuoxian Huang, Wenjia Zhang,
- Abstract summary: We introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells.
We derive an optimal grid scale ratio for spatial multi-dimensional spaces based on principles of biological efficiency.
Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces.
- Score: 6.192516215592685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding spatial location and relationships is a fundamental capability for modern artificial intelligence systems. Insights from human spatial cognition provide valuable guidance in this domain. Neuroscientific discoveries have highlighted the role of grid cells as a fundamental neural component for spatial representation, including distance computation, path integration, and scale discernment. In this paper, we introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells. Assuming that grid cells encode spatial position through a summation of Fourier basis functions, we demonstrate the translational invariance of the grid representation during inner product calculations. Additionally, we derive an optimal grid scale ratio for multi-dimensional Euclidean spaces based on principles of biological efficiency. Utilizing these computational principles, we have developed a Grid-cell inspired Positional Encoding technique, termed GridPE, for encoding locations within high-dimensional spaces. We integrated GridPE into the Pyramid Vision Transformer architecture. Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces. Experimental results demonstrate that GridPE significantly enhances the performance of transformers, underscoring the importance of incorporating neuroscientific insights into the design of artificial intelligence systems.
Related papers
- A Grid Cell-Inspired Structured Vector Algebra for Cognitive Maps [4.498459787490856]
The entorhinal-hippocampal formation is the mammalian brain's navigation system, encoding both physical and abstract spaces via grid cells.
Here, we propose a mechanistic model for versatile information processing in the entorhinal-hippocampal formation inspired by CANs and Vector Architectures (VSAs)
The novel grid-cell VSA model employs a spatially structured encoding scheme with 3D modules mimicking the discrete scales and orientations of grid cell modules.
arXiv Detail & Related papers (2025-03-11T16:45:52Z) - Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.
In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z) - Attending to Topological Spaces: The Cellular Transformer [37.84207797241944]
Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data.
We introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes.
CT achieves state-of-the-art performance, but it does so without the need for more complex enhancements.
arXiv Detail & Related papers (2024-05-23T01:48:32Z) - HyPE-GT: where Graph Transformers meet Hyperbolic Positional Encodings [19.78896931593813]
We introduce an innovative and efficient framework that introduces a set of learnable positional encodings into the Transformer.
This approach empowers us to explore diverse options for optimal selection of PEs for specific downstream tasks.
We repurpose these positional encodings to mitigate the impact of over-smoothing in deep Graph Neural Networks (GNNs)
arXiv Detail & Related papers (2023-12-11T18:00:27Z) - Self-Supervised Learning of Representations for Space Generates
Multi-Modular Grid Cells [16.208253624969142]
mammalian lineage has developed striking spatial representations.
One important spatial representation is the Nobel-prize winning grid cells.
Nobel-prize winning grid cells represent self-location, a local and aperiodic quantity.
arXiv Detail & Related papers (2023-11-04T03:59:37Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers [63.41460219156508]
We argue that existing positional encoding schemes are suboptimal for 3D vision tasks.
We propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation.
We show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models.
arXiv Detail & Related papers (2023-10-16T13:16:09Z) - DAGrid: Directed Accumulator Grid [13.188564605481544]
We present the Directed Accumulator Grid (DAGrid), which allows geometric-preserving filtering in neural networks.
We show DAGrid has realized a 70.8% reduction in network parameter size and a 96.8% decrease in FLOPs.
It has also achieved improvements of 4.4% and 8.2% in the average Dice score and Dice score of the left ventricular mass.
arXiv Detail & Related papers (2023-06-05T04:33:32Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Graph Neural Networks with Learnable Structural and Positional
Representations [83.24058411666483]
A major issue with arbitrary graphs is the absence of canonical positional information of nodes.
We introduce Positional nodes (PE) of nodes, and inject it into the input layer, like in Transformers.
We observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes.
arXiv Detail & Related papers (2021-10-15T05:59:15Z) - Learnable Fourier Features for Multi-DimensionalSpatial Positional
Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features.
Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z) - Spatial Dependency Networks: Neural Layers for Improved Generative Image
Modeling [79.15521784128102]
We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs)
In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way.
We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation.
arXiv Detail & Related papers (2021-03-16T07:01:08Z) - Grid Cells Are Ubiquitous in Neural Networks [0.0]
Grid cells are believed to play an important role in both spatial and non-spatial cognition tasks.
Recent study observed the emergence of grid cells in an LSTM for path integration.
arXiv Detail & Related papers (2020-03-07T01:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.