Related papers: Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS

URL: http://arxiv.org/abs/2306.05083v2
Date: Tue, 13 Jun 2023 04:56:06 GMT
Title: Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS
Authors: Cheng-Han Chiang, Yung-Sung Chuang, James Glass, Hung-yi Lee
Abstract summary: It is unclear what kind of sentence pairs a sentence encoder (SE) would consider similar. HEROS is constructed by transforming an original sentence into a new sentence based on certain rules to form a textitminimal pair By systematically comparing the performance of over 60 supervised and unsupervised SEs on HEROS, we reveal that most unsupervised sentence encoders are insensitive to negation.
Score: 68.34155010428941
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing sentence textual similarity benchmark datasets only use a single number to summarize how similar the sentence encoder's decision is to humans'. However, it is unclear what kind of sentence pairs a sentence encoder (SE) would consider similar. Moreover, existing SE benchmarks mainly consider sentence pairs with low lexical overlap, so it is unclear how the SEs behave when two sentences have high lexical overlap. We introduce a high-quality SE diagnostic dataset, HEROS. HEROS is constructed by transforming an original sentence into a new sentence based on certain rules to form a \textit{minimal pair}, and the minimal pair has high lexical overlaps. The rules include replacing a word with a synonym, an antonym, a typo, a random word, and converting the original sentence into its negation. Different rules yield different subsets of HEROS. By systematically comparing the performance of over 60 supervised and unsupervised SEs on HEROS, we reveal that most unsupervised sentence encoders are insensitive to negation. We find the datasets used to train the SE are the main determinants of what kind of sentence pairs an SE considers similar. We also show that even if two SEs have similar performance on STS benchmarks, they can have very different behavior on HEROS. Our result reveals the blind spot of traditional STS benchmarks when evaluating SEs.

Related papers

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text. We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z)
The Daunting Dilemma with Sentence Encoders: Success on Standard Benchmarks, Failure in Capturing Basic Semantic Properties [6.747934699209743]
We evaluate five existing popular sentence encoders, i.e., Sentence-BERT, Universal Sentence (USE), LASER, Inferfer, and Doc2vec. We propose four semantic evaluation criteria, i.e., Paraphrasing, Synonym Replacement, Antonym Replacement, and Sentence Jumbling. We find that Sentence-Bert and USE models pass the paraphrasing criterion, with SBERT being the superior between the two.
arXiv Detail & Related papers (2023-09-07T14:42:35Z)
RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning. It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework. An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z)
InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings [61.77760317554826]
This paper proposes an information-d contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE. We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large.
arXiv Detail & Related papers (2022-10-08T15:53:19Z)
SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning [14.028140579482688]
SimCSE surprisingly dominates discrete augmentations such as cropping, word deletion, and synonym replacement as reported. We develop three simple yet effective discrete sentence augmentation schemes: punctuation insertion, modal verbs, and double negation. Results support the superiority of the proposed methods consistently.
arXiv Detail & Related papers (2022-10-08T08:07:47Z)
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words. We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE. The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z)
Ranking-Enhanced Unsupervised Sentence Representation Learning [32.89057204258891]
We show that semantic meaning of a sentence is determined by nearest-neighbor sentences that are similar to the input sentence. We propose a novel unsupervised sentence encoder, RankEncoder.
arXiv Detail & Related papers (2022-09-09T14:45:16Z)
DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings [51.274478128525686]
DiffCSE is an unsupervised contrastive learning framework for learning sentence embeddings. Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods.
arXiv Detail & Related papers (2022-04-21T17:32:01Z)
A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings [28.046786376565123]
We propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT) We exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks.
arXiv Detail & Related papers (2022-03-11T12:29:22Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding [41.09180639504244]
The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE) We develop a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE) ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.
arXiv Detail & Related papers (2021-09-09T16:07:31Z)
Three Sentences Are All You Need: Local Path Enhanced Document Relation Extraction [54.95848026576076]
We present an embarrassingly simple but effective method to select evidence sentences for document-level RE. We have released our code at https://github.com/AndrewZhe/Three-Sentences-Are-All-You-Need.
arXiv Detail & Related papers (2021-06-03T12:29:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.