Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
- URL: http://arxiv.org/abs/2401.16421v2
- Date: Mon, 17 Jun 2024 06:18:13 GMT
- Title: Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
- Authors: Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He,
- Abstract summary: We develop a new positional encoding method called Bilevel Positional.
Ethicical analysis shows this disentanglement of positional information makes learning more effective.
Our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
- Score: 69.68831888599476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
Related papers
- T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation [14.450329809640422]
We introduce U-Net v2, a new robust and efficient U-Net variant for medical image segmentation.
It aims to augment the infusion of semantic information into low-level features while simultaneously refining high-level features with finer details.
arXiv Detail & Related papers (2023-11-29T16:35:24Z) - Linguistically Motivated Sign Language Segmentation [51.06873383204105]
We consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing.
arXiv Detail & Related papers (2023-10-21T10:09:34Z) - The Locality and Symmetry of Positional Encodings [9.246374019271938]
We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style)
We uncover the core function of PEs by identifying two common properties, Locality and Symmetry.
We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
arXiv Detail & Related papers (2023-10-19T16:15:15Z) - CONFLATOR: Incorporating Switching Point based Rotatory Positional
Encodings for Code-Mixed Language Modeling [10.26356931263957]
We introduce CONFLATOR: a neural language modeling approach for code-mixed languages.
We show that rotatory positional encodings along with switching point information yield the best results.
ConFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English.
arXiv Detail & Related papers (2023-09-11T07:02:13Z) - Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation [87.49579477873196]
We first design a two-stream encoder to extract CNN-based visual features and transformer-based linguistic features hierarchically.
A vision-language mutual guidance (VLMG) module is inserted into the encoder multiple times to promote the hierarchical and progressive fusion of multi-modal features.
In order to promote the temporal alignment between frames, we propose a language-guided multi-scale dynamic filtering (LMDF) module.
arXiv Detail & Related papers (2022-03-30T01:06:13Z) - Representation and Correlation Enhanced Encoder-Decoder Framework for
Scene Text Recognition [10.496558786568672]
We propose a Representation and Correlation Enhanced-Decoder Framework(RCEED) to address these deficiencies and break performance bottleneck.
In the encoder module, local visual feature, global context feature, and position information are aligned and fused to generate a small-size comprehensive feature map.
In the decoder module, two methods are utilized to enhance the correlation between scene and text feature space.
arXiv Detail & Related papers (2021-06-13T10:36:56Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.