An Unsupervised Sentence Embedding Method by Mutual Information
Maximization
- URL: http://arxiv.org/abs/2009.12061v2
- Date: Fri, 5 Feb 2021 03:15:25 GMT
- Title: An Unsupervised Sentence Embedding Method by Mutual Information
Maximization
- Authors: Yan Zhang, Ruidan He, Zuozhu Liu, Kwan Hui Lim, Lidong Bing
- Abstract summary: Sentence BERT (SBERT) is inefficient for sentence-pair tasks such as clustering or semantic search.
We propose a lightweight extension on top of BERT and a novel self-supervised learning objective.
Our method is not restricted by the availability of labeled data, such as it can be applied on different domain-specific corpus.
- Score: 34.947950543830686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BERT is inefficient for sentence-pair tasks such as clustering or semantic
search as it needs to evaluate combinatorially many sentence pairs which is
very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by
learning semantically meaningful representations of single sentences, such that
similarity comparison can be easily accessed. However, SBERT is trained on
corpus with high-quality labeled sentence pairs, which limits its application
to tasks where labeled data is extremely scarce. In this paper, we propose a
lightweight extension on top of BERT and a novel self-supervised learning
objective based on mutual information maximization strategies to derive
meaningful sentence embeddings in an unsupervised manner. Unlike SBERT, our
method is not restricted by the availability of labeled data, such that it can
be applied on different domain-specific corpus. Experimental results show that
the proposed method significantly outperforms other unsupervised sentence
embedding baselines on common semantic textual similarity (STS) tasks and
downstream supervised tasks. It also outperforms SBERT in a setting where
in-domain labeled data is not available, and achieves performance competitive
with supervised methods on various tasks.
Related papers
- Revisiting Sparse Retrieval for Few-shot Entity Linking [33.15662306409253]
We propose an ELECTRA-based keyword extractor to denoise the mention context and construct a better query expression.
For training the extractor, we propose a distant supervision method to automatically generate training data based on overlapping tokens between mention contexts and entity descriptions.
Experimental results on the ZESHEL dataset demonstrate that the proposed method outperforms state-of-the-art models by a significant margin across all test domains.
arXiv Detail & Related papers (2023-10-19T03:51:10Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z) - MDERank: A Masked Document Embedding Rank Approach for Unsupervised
Keyphrase Extraction [41.941098507759015]
Keyphrases are phrases in a document providing a concise summary of core content, helping readers to understand what the article is talking about in a minute.
We propose a novel unsupervised keyword extraction method by leveraging the BERT-based model to select and rank candidate keyphrases with a MASK strategy.
arXiv Detail & Related papers (2021-10-13T11:29:17Z) - ConSERT: A Contrastive Framework for Self-Supervised Sentence
Representation Transfer [19.643512923368743]
We present ConSERT, a Contrastive Framework for Self-Supervised Sentence Representation Transfer.
By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations.
Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art.
arXiv Detail & Related papers (2021-05-25T08:15:01Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE)
It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [1.4998865865537996]
We present a study exploring the use of cross-sentence information for NER using BERT models in five languages.
We find that adding context in the form of additional sentences to BERT input increases NER performance on all of the tested languages and models.
We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT.
arXiv Detail & Related papers (2020-06-02T12:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.