Related papers: Relative Position Prediction as Pre-training for Text Encoders

Relative Position Prediction as Pre-training for Text Encoders

URL: http://arxiv.org/abs/2202.01145v1
Date: Wed, 2 Feb 2022 17:13:31 GMT
Title: Relative Position Prediction as Pre-training for Text Encoders
Authors: Rickard Br\"uel-Gabrielsson, Chris Scarvelis
Abstract summary: We argue that a position-centric perspective is more general and useful. We adapt the relative position encoding paradigm in NLP to create relative labels for self-supervised learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Meaning is defined by the company it keeps. However, company is two-fold: It's based on the identity of tokens and also on their position (topology). We argue that a position-centric perspective is more general and useful. The classic MLM and CLM objectives in NLP are easily phrased as position predictions over the whole vocabulary. Adapting the relative position encoding paradigm in NLP to create relative labels for self-supervised learning, we seek to show superior pre-training judged by performance on downstream tasks.

Related papers

SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z)
Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning [13.725832389453911]
Citation classification is pivotal for scholarly analysis.<n>Previous works suggest fine-tuning pretrained language models (PLMs) on citation classification.<n>We present a novel framework, Citss, that adapts the PLMs to overcome these challenges.
arXiv Detail & Related papers (2025-05-20T15:05:27Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs) Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
Attention Instruction: Amplifying Attention in the Middle via Prompting [35.07098912195063]
Language models still suffer from position bias and have difficulty in accessing and using the middle part of the context. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting.
arXiv Detail & Related papers (2024-06-24T19:35:11Z)
Contextual Position Encoding: Learning to Count What's Important [42.038277620194]
We propose a new position encoding method, Contextual Position Flop (CoPE) CoPE allows positions to be conditioned on context by incrementing position on certain tokens determined by the model. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail.
arXiv Detail & Related papers (2024-05-29T02:57:15Z)
Tsetlin Machine Embedding: Representing Words Using Logical Expressions [10.825099126920028]
We introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee" We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks.
arXiv Detail & Related papers (2023-01-02T15:02:45Z)
Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z)
The Curious Case of Absolute Position Embeddings [65.13827063579728]
Transformer language models encode the notion of word order using positional information. In natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. We observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information.
arXiv Detail & Related papers (2022-10-23T00:00:04Z)
An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction [13.765146062545048]
Target-oriented opinion words extraction (TOWE) is a new subtask of target-oriented sentiment analysis. We show that BiLSTM-based models can effectively encode position information into word representations. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information.
arXiv Detail & Related papers (2021-09-02T22:49:45Z)
R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching. We first employ BERT to encode the input sentences from a global perspective. Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective. To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z)
Adversarial Transfer Learning for Punctuation Restoration [58.2201356693101]
Adversarial multi-task learning is introduced to learn task invariant knowledge for punctuation prediction. Experiments are conducted on IWSLT2011 datasets.
arXiv Detail & Related papers (2020-04-01T06:19:56Z)
Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks. We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes. We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.