SemGloVe: Semantic Co-occurrences for GloVe from BERT
- URL: http://arxiv.org/abs/2012.15197v1
- Date: Wed, 30 Dec 2020 15:38:26 GMT
- Title: SemGloVe: Semantic Co-occurrences for GloVe from BERT
- Authors: Leilei Gan, Zhiyang Teng, Yue Zhang, Linchao Zhu, Fei Wu, Yi Yang
- Abstract summary: GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
- Score: 55.420035541274444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GloVe learns word embeddings by leveraging statistical information from word
co-occurrence matrices. However, word pairs in the matrices are extracted from
a predefined local context window, which might lead to limited word pairs and
potentially semantic irrelevant word pairs. In this paper, we propose SemGloVe,
which distills semantic co-occurrences from BERT into static GloVe word
embeddings. Particularly, we propose two models to extract co-occurrence
statistics based on either the masked language model or the multi-head
attention weights of BERT. Our methods can extract word pairs without limiting
by the local window assumption and can define the co-occurrence weights by
directly considering the semantic distance between word pairs. Experiments on
several word similarity datasets and four external tasks show that SemGloVe can
outperform GloVe.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Leveraging multilingual transfer for unsupervised semantic acoustic word
embeddings [23.822788597966646]
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content.
In this paper we explore semantic AWE modelling.
We show -- for the first time -- that AWEs can be used for downstream semantic query-by-example search.
arXiv Detail & Related papers (2023-07-05T07:46:54Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Improving Contextual Representation with Gloss Regularized Pre-training [9.589252392388758]
We propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT) to enhance word semantic similarity.
By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled.
Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation.
arXiv Detail & Related papers (2022-05-13T12:50:32Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Extending Multi-Sense Word Embedding to Phrases and Sentences for
Unsupervised Semantic Applications [34.71597411512625]
We propose a novel embedding method for a text sequence (a phrase or a sentence) where each sequence is represented by a distinct set of codebook embeddings.
Our experiments show that the per-sentence codebook embeddings significantly improve the performances in unsupervised sentence similarity and extractive summarization benchmarks.
arXiv Detail & Related papers (2021-03-29T04:54:28Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - PBoS: Probabilistic Bag-of-Subwords for Generalizing Word Embedding [16.531103175919924]
We look into the task of emphgeneralizing word embeddings.
given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words.
We propose a model, along with an efficient algorithm, that simultaneously models subword segmentation and computes subword-based compositional word embedding.
arXiv Detail & Related papers (2020-10-21T08:11:08Z) - SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in
BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings.
Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages.
Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z) - Supervised Understanding of Word Embeddings [1.160208922584163]
We have obtained supervised projections in the form of the linear keyword-level classifiers on word embeddings.
We have shown that the method creates interpretable projections of original embedding dimensions.
arXiv Detail & Related papers (2020-06-23T20:13:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.