A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive
Learning Framework for Sentence Embeddings
- URL: http://arxiv.org/abs/2203.05877v1
- Date: Fri, 11 Mar 2022 12:29:22 GMT
- Title: A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive
Learning Framework for Sentence Embeddings
- Authors: Haochen Tan, Wei Shao, Han Wu, Ke Yang, Linqi Song
- Abstract summary: We propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT)
We exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax.
Our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks.
- Score: 28.046786376565123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has shown great potential in unsupervised sentence
embedding tasks, e.g., SimCSE. However, We find that these existing solutions
are heavily affected by superficial features like the length of sentences or
syntactic structures. In this paper, we propose a semantics-aware contrastive
learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT),
which is able to exploit the pseudo-token space (i.e., latent semantic space)
representation of a sentence while eliminating the impact of superficial
features such as sentence length and syntax. Specifically, we introduce an
additional pseudo token embedding layer independent of the BERT encoder to map
each sentence into a sequence of pseudo tokens in a fixed length. Leveraging
these pseudo sequences, we are able to construct same-length positive and
negative pairs based on the attention mechanism to perform contrastive
learning. In addition, we utilize both the gradient-updating and
momentum-updating encoders to encode instances while dynamically maintaining an
additional queue to store the representation of sentence embeddings, enhancing
the encoder's learning performance for negative examples. Experiments show that
our model outperforms the state-of-the-art baselines on six standard semantic
textual similarity (STS) tasks. Furthermore, experiments on alignments and
uniformity losses, as well as hard examples with different sentence lengths and
syntax, consistently verify the effectiveness of our method.
Related papers
- BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection [8.303512060791736]
Spoken term detection is often hindered by reliance on frame-level features and the computationally intensive DTW-based template matching.
We propose a novel approach that encodes speech into discrete, speaker-agnostic semantic tokens.
This facilitates fast retrieval using text-based search algorithms and effectively handles out-of-vocabulary terms.
arXiv Detail & Related papers (2024-11-21T13:05:18Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - Bridging Continuous and Discrete Spaces: Interpretable Sentence
Representation Learning via Compositional Operations [80.45474362071236]
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings.
arXiv Detail & Related papers (2023-05-24T00:44:49Z) - SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning [14.028140579482688]
SimCSE surprisingly dominates discrete augmentations such as cropping, word deletion, and synonym replacement as reported.
We develop three simple yet effective discrete sentence augmentation schemes: punctuation insertion, modal verbs, and double negation.
Results support the superiority of the proposed methods consistently.
arXiv Detail & Related papers (2022-10-08T08:07:47Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Self-Guided Contrastive Learning for BERT Sentence Representations [19.205754738851546]
We propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations.
Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation, and enables the usual [] token embeddings to function as sentence vectors.
arXiv Detail & Related papers (2021-06-03T05:52:43Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.