Ranking-Enhanced Unsupervised Sentence Representation Learning
- URL: http://arxiv.org/abs/2209.04333v3
- Date: Thu, 18 May 2023 08:50:22 GMT
- Title: Ranking-Enhanced Unsupervised Sentence Representation Learning
- Authors: Yeon Seonwoo, Guoyin Wang, Changmin Seo, Sajal Choudhary, Jiwei Li,
Xiang Li, Puyang Xu, Sunghyun Park, Alice Oh
- Abstract summary: We show that semantic meaning of a sentence is determined by nearest-neighbor sentences that are similar to the input sentence.
We propose a novel unsupervised sentence encoder, RankEncoder.
- Score: 32.89057204258891
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unsupervised sentence representation learning has progressed through
contrastive learning and data augmentation methods such as dropout masking.
Despite this progress, sentence encoders are still limited to using only an
input sentence when predicting its semantic vector. In this work, we show that
the semantic meaning of a sentence is also determined by nearest-neighbor
sentences that are similar to the input sentence. Based on this finding, we
propose a novel unsupervised sentence encoder, RankEncoder. RankEncoder
predicts the semantic vector of an input sentence by leveraging its
relationship with other sentences in an external corpus, as well as the input
sentence itself. We evaluate RankEncoder on semantic textual benchmark
datasets. From the experimental results, we verify that 1) RankEncoder achieves
80.07% Spearman's correlation, a 1.1% absolute improvement compared to the
previous state-of-the-art performance, 2) RankEncoder is universally applicable
to existing unsupervised sentence embedding methods, and 3) RankEncoder is
specifically effective for predicting the similarity scores of similar sentence
pairs.
Related papers
- SenTest: Evaluating Robustness of Sentence Encoders [0.4194295877935868]
This work focuses on evaluating the robustness of the sentence encoders.
We employ several adversarial attacks to evaluate its robustness.
The results of the experiments strongly undermine the robustness of sentence encoders.
arXiv Detail & Related papers (2023-11-29T15:21:35Z) - Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - Bipartite Graph Pre-training for Unsupervised Extractive Summarization
with Graph Convolutional Auto-Encoders [24.13261636386226]
We argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize cohensive and distinctive sentence representations helps rank significant sentences.
We propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features.
arXiv Detail & Related papers (2023-10-29T12:27:18Z) - Non-Autoregressive Sentence Ordering [22.45972496989434]
We propose a novel Non-Autoregressive Ordering Network, dubbed textitNAON, which explores bilateral dependencies between sentences and predicts the sentence for each position in parallel.
We conduct extensive experiments on several common-used datasets and the experimental results show that our method outperforms all the autoregressive approaches.
arXiv Detail & Related papers (2023-10-19T10:57:51Z) - Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS [68.34155010428941]
It is unclear what kind of sentence pairs a sentence encoder (SE) would consider similar.
HEROS is constructed by transforming an original sentence into a new sentence based on certain rules to form a textitminimal pair
By systematically comparing the performance of over 60 supervised and unsupervised SEs on HEROS, we reveal that most unsupervised sentence encoders are insensitive to negation.
arXiv Detail & Related papers (2023-06-08T10:24:02Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Generate, Discriminate and Contrast: A Semi-Supervised Sentence
Representation Learning Framework [68.04940365847543]
We propose a semi-supervised sentence embedding framework, GenSE, that effectively leverages large-scale unlabeled data.
Our method include three parts: 1) Generate: A generator/discriminator model is jointly trained to synthesize sentence pairs from open-domain unlabeled corpus; 2) Discriminate: Noisy sentence pairs are filtered out by the discriminator to acquire high-quality positive and negative sentence pairs; 3) Contrast: A prompt-based contrastive approach is presented for sentence representation learning with both annotated and synthesized data.
arXiv Detail & Related papers (2022-10-30T10:15:21Z) - FairFil: Contrastive Neural Debiasing Method for Pretrained Text
Encoders [68.8687509471322]
We propose the first neural debiasing method for a pretrained sentence encoder, which transforms the pretrained encoder outputs into debiased representations via a fair filter network.
On real-world datasets, our FairFil effectively reduces the bias degree of pretrained text encoders, while continuously showing desirable performance on downstream tasks.
arXiv Detail & Related papers (2021-03-11T02:01:14Z) - Revisiting Paraphrase Question Generator using Pairwise Discriminator [25.449902612898594]
We propose a novel method for obtaining sentence-level embeddings.
The proposed method results in semantic embeddings and outperforms the state-of-the-art on the paraphrase generation and sentiment analysis task.
arXiv Detail & Related papers (2019-12-31T02:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.