DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
- URL: http://arxiv.org/abs/2405.14722v6
- Date: Tue, 05 Nov 2024 08:22:00 GMT
- Title: DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
- Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li,
- Abstract summary: Positional encoding plays a crucial role in transformers, significantly impacting model performance and generalization length.
We propose a Data-Adaptive Positional (DAPE) method, which enhances model performances in terms of trained length and length generalization.
We successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods.
- Score: 60.18239094672938
- License:
- Abstract: Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant. The model visualization suggests that our model can keep both local and anti-local information. Finally, we successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods, revealing the benefit of the adaptive positional encoding method.
Related papers
- Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Condition-Invariant Semantic Segmentation [77.10045325743644]
We implement Condition-Invariant Semantic (CISS) on the current state-of-the-art domain adaptation architecture.
Our method achieves the second-best performance on the normal-to-adverse Cityscapes$to$ACDC benchmark.
CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.
arXiv Detail & Related papers (2023-05-27T03:05:07Z) - Improving Position Encoding of Transformers for Multivariate Time Series
Classification [5.467400475482668]
We propose a new absolute position encoding method dedicated to time series data called time Absolute Position.
We then propose a novel time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data.
arXiv Detail & Related papers (2023-05-26T05:30:04Z) - Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching.
ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z) - Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files.
We show that our URL transformer model requires a different training approach to reach high performance levels.
We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z) - Back to the Source: Diffusion-Driven Test-Time Adaptation [77.4229736436935]
Test-time adaptation harnesses test inputs to improve accuracy of a model trained on source data when tested on shifted target data.
We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
arXiv Detail & Related papers (2022-07-07T17:14:10Z) - Adaptive Fine-Tuning of Transformer-Based Language Models for Named
Entity Recognition [0.0]
The current standard approach for fine-tuning language models includes a fixed number of training epochs and a linear learning rate schedule.
In this paper, we introduce adaptive fine-tuning, which is an alternative approach that uses early stopping and a custom learning rate schedule.
arXiv Detail & Related papers (2022-02-05T19:20:03Z) - Lifelong Unsupervised Domain Adaptive Person Re-identification with
Coordinated Anti-forgetting and Adaptation [127.6168183074427]
We propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.
This is challenging because it requires the model to continuously adapt to unlabeled data of the target environments.
We design an effective scheme for this task, dubbed CLUDA-ReID, where the anti-forgetting is harmoniously coordinated with the adaptation.
arXiv Detail & Related papers (2021-12-13T13:19:45Z) - Adaptive L2 Regularization in Person Re-Identification [0.9195729979000402]
We introduce an adaptive L2 regularization mechanism in the setting of person re-identification.
Experiments on the Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of our framework.
arXiv Detail & Related papers (2020-07-15T17:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.