An Iterative Contextualization Algorithm with Second-Order Attention
- URL: http://arxiv.org/abs/2103.02190v1
- Date: Wed, 3 Mar 2021 05:34:50 GMT
- Title: An Iterative Contextualization Algorithm with Second-Order Attention
- Authors: Diego Maupom\'e and Marie-Jean Meurs
- Abstract summary: We show how to combine the representations of words that make up a sentence into a cohesive whole.
Our algorithm starts with a presumably erroneous value of the context, and adjusts this value with respect to the tokens at hand.
Our models report strong results in several well-known text classification tasks.
- Score: 0.40611352512781856
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Combining the representations of the words that make up a sentence into a
cohesive whole is difficult, since it needs to account for the order of words,
and to establish how the words present relate to each other. The solution we
propose consists in iteratively adjusting the context. Our algorithm starts
with a presumably erroneous value of the context, and adjusts this value with
respect to the tokens at hand. In order to achieve this, representations of
words are built combining their symbolic embedding with a positional encoding
into single vectors. The algorithm then iteratively weighs and aggregates these
vectors using our novel second-order attention mechanism. Our models report
strong results in several well-known text classification tasks.
Related papers
- Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining [0.22499166814992438]
We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector is not sufficient for effective phrase retrieval.
We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations.
arXiv Detail & Related papers (2024-05-12T12:08:05Z) - Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language.
We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string.
Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - Evaluating Various Tokenizers for Arabic Text Classification [4.110108749051656]
We introduce three new tokenization algorithms for Arabic and compare them to three other baselines using unsupervised evaluations.
Our experiments show that the performance of such tokenization algorithms depends on the size of the dataset, type of the task, and the amount of morphology that exists in the dataset.
arXiv Detail & Related papers (2021-06-14T16:05:58Z) - WOVe: Incorporating Word Order in GloVe Word Embeddings [0.0]
Defining a word as a vector makes it easy for machine learning algorithms to understand a text and extract information from it.
Word vector representations have been used in many applications such word synonyms, word analogy, syntactic parsing, and many others.
arXiv Detail & Related papers (2021-05-18T15:28:20Z) - Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z) - Text Information Aggregation with Centrality Attention [86.91922440508576]
We propose a new way of obtaining aggregation weights, called eigen-centrality self-attention.
We build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word.
arXiv Detail & Related papers (2020-11-16T13:08:48Z) - Research on Annotation Rules and Recognition Algorithm Based on Phrase
Window [4.334276223622026]
We propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms.
The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases.
The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence.
arXiv Detail & Related papers (2020-07-07T00:19:47Z) - Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment.
We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity.
We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z) - CompLex: A New Corpus for Lexical Complexity Prediction from Likert
Scale Data [13.224233182417636]
This paper presents the first English dataset for continuous lexical complexity prediction.
We use a 5-point Likert scale scheme to annotate complex words in texts from three sources/domains: the Bible, Europarl, and biomedical texts.
arXiv Detail & Related papers (2020-03-16T03:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.