The Daunting Dilemma with Sentence Encoders: Success on Standard
Benchmarks, Failure in Capturing Basic Semantic Properties
- URL: http://arxiv.org/abs/2309.03747v1
- Date: Thu, 7 Sep 2023 14:42:35 GMT
- Title: The Daunting Dilemma with Sentence Encoders: Success on Standard
Benchmarks, Failure in Capturing Basic Semantic Properties
- Authors: Yash Mahajan, Naman Bansal, Shubhra Kanti Karmaker ("Santu")
- Abstract summary: We evaluate five existing popular sentence encoders, i.e., Sentence-BERT, Universal Sentence (USE), LASER, Inferfer, and Doc2vec.
We propose four semantic evaluation criteria, i.e., Paraphrasing, Synonym Replacement, Antonym Replacement, and Sentence Jumbling.
We find that Sentence-Bert and USE models pass the paraphrasing criterion, with SBERT being the superior between the two.
- Score: 6.747934699209743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we adopted a retrospective approach to examine and compare
five existing popular sentence encoders, i.e., Sentence-BERT, Universal
Sentence Encoder (USE), LASER, InferSent, and Doc2vec, in terms of their
performance on downstream tasks versus their capability to capture basic
semantic properties. Initially, we evaluated all five sentence encoders on the
popular SentEval benchmark and found that multiple sentence encoders perform
quite well on a variety of popular downstream tasks. However, being unable to
find a single winner in all cases, we designed further experiments to gain a
deeper understanding of their behavior. Specifically, we proposed four semantic
evaluation criteria, i.e., Paraphrasing, Synonym Replacement, Antonym
Replacement, and Sentence Jumbling, and evaluated the same five sentence
encoders using these criteria. We found that the Sentence-Bert and USE models
pass the paraphrasing criterion, with SBERT being the superior between the two.
LASER dominates in the case of the synonym replacement criterion.
Interestingly, all the sentence encoders failed the antonym replacement and
jumbling criteria. These results suggest that although these popular sentence
encoders perform quite well on the SentEval benchmark, they still struggle to
capture some basic semantic properties, thus, posing a daunting dilemma in NLP
research.
Related papers
- Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation [72.10931780019297]
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design.
We propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH)
Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.
arXiv Detail & Related papers (2023-10-06T03:33:42Z) - Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS [68.34155010428941]
It is unclear what kind of sentence pairs a sentence encoder (SE) would consider similar.
HEROS is constructed by transforming an original sentence into a new sentence based on certain rules to form a textitminimal pair
By systematically comparing the performance of over 60 supervised and unsupervised SEs on HEROS, we reveal that most unsupervised sentence encoders are insensitive to negation.
arXiv Detail & Related papers (2023-06-08T10:24:02Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - Ranking-Enhanced Unsupervised Sentence Representation Learning [32.89057204258891]
We show that semantic meaning of a sentence is determined by nearest-neighbor sentences that are similar to the input sentence.
We propose a novel unsupervised sentence encoder, RankEncoder.
arXiv Detail & Related papers (2022-09-09T14:45:16Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Discrete Cosine Transform as Universal Sentence Encoder [10.355894890759377]
We use Discrete Cosine Transform (DCT) to generate universal sentence representation for different languages.
The experimental results clearly show the superior effectiveness of DCT encoding.
arXiv Detail & Related papers (2021-06-02T04:43:54Z) - Rewriting Meaningful Sentences via Conditional BERT Sampling and an
application on fooling text classifiers [11.49508308643065]
adversarial attack methods that are designed to deceive a text classifier change the text classifier's prediction by modifying a few words or characters.
Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting.
In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting.
We propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and
arXiv Detail & Related papers (2020-10-22T17:03:13Z) - Syntactically Look-Ahead Attention Network for Sentence Compression [36.6256383447417]
Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words.
In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words.
We propose a novel Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can generate informative summaries.
arXiv Detail & Related papers (2020-02-04T06:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.