Testing the assumptions about the geometry of sentence embedding spaces: the cosine measure need not apply
- URL: http://arxiv.org/abs/2509.01606v1
- Date: Mon, 01 Sep 2025 16:37:03 GMT
- Title: Testing the assumptions about the geometry of sentence embedding spaces: the cosine measure need not apply
- Authors: Vivi Nastase, Paola Merlo,
- Abstract summary: Transformer models learn to encode and decode an input text, and produce contextual token embeddings as a side-effect.<n>The mapping from language into the embedding space maps words expressing similar concepts onto points that are close in the space.<n>In practice, the reverse implication is also assumed: words corresponding to close points in this space are similar or related, those that are further are not.
- Score: 1.1544794958059696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models learn to encode and decode an input text, and produce contextual token embeddings as a side-effect. The mapping from language into the embedding space maps words expressing similar concepts onto points that are close in the space. In practice, the reverse implication is also assumed: words corresponding to close points in this space are similar or related, those that are further are not. Does closeness in the embedding space extend to shared properties for sentence embeddings? We present an investigation of sentence embeddings and show that the geometry of their embedding space is not predictive of their relative performances on a variety of tasks. We compute sentence embeddings in three ways: as averaged token embeddings, as the embedding of the special [CLS] token, and as the embedding of a random token from the sentence. We explore whether there is a correlation between the distance between sentence embedding variations and their performance on linguistic tasks, and whether despite their distances, they do encode the same information in the same manner. The results show that the cosine similarity -- which treats dimensions shallowly -- captures (shallow) commonalities or differences between sentence embeddings, which are not predictive of their performance on specific tasks. Linguistic information is rather encoded in weighted combinations of different dimensions, which are not reflected in the geometry of the sentence embedding space.
Related papers
- Spoken Word2Vec: Learning Skipgram Embeddings from Speech [0.8901073744693314]
We show how shallow skipgram-like algorithms fail to encode distributional semantics when the input units are acoustically correlated.
We illustrate the potential of an alternative deep end-to-end variant of the model and examine the effects on the resulting embeddings.
arXiv Detail & Related papers (2023-11-15T19:25:29Z) - Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - Bridging Continuous and Discrete Spaces: Interpretable Sentence
Representation Learning via Compositional Operations [80.45474362071236]
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings.
arXiv Detail & Related papers (2023-05-24T00:44:49Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Towards Robust and Semantically Organised Latent Representations for
Unsupervised Text Style Transfer [6.467090475885798]
We introduce EPAAEs (versading Perturbed Adrial AutoEncoders) which completes this perturbation model.
We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together.
We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE.
arXiv Detail & Related papers (2022-05-04T20:04:24Z) - A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive
Learning Framework for Sentence Embeddings [28.046786376565123]
We propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT)
We exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax.
Our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks.
arXiv Detail & Related papers (2022-03-11T12:29:22Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - Discovering linguistic (ir)regularities in word embeddings through
max-margin separating hyperplanes [0.0]
We show new methods for learning how related words are positioned relative to each other in word embedding spaces.
Our model, SVMCos, is robust to a range of experimental choices when training word embeddings.
arXiv Detail & Related papers (2020-03-07T20:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.