Supervised Understanding of Word Embeddings
- URL: http://arxiv.org/abs/2006.13299v1
- Date: Tue, 23 Jun 2020 20:13:42 GMT
- Title: Supervised Understanding of Word Embeddings
- Authors: Halid Ziya Yerebakan, Parmeet Bhatia, Yoshihisa Shinagawa
- Abstract summary: We have obtained supervised projections in the form of the linear keyword-level classifiers on word embeddings.
We have shown that the method creates interpretable projections of original embedding dimensions.
- Score: 1.160208922584163
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pre-trained word embeddings are widely used for transfer learning in natural
language processing. The embeddings are continuous and distributed
representations of the words that preserve their similarities in compact
Euclidean spaces. However, the dimensions of these spaces do not provide any
clear interpretation. In this study, we have obtained supervised projections in
the form of the linear keyword-level classifiers on word embeddings. We have
shown that the method creates interpretable projections of original embedding
dimensions. Activations of the trained classifier nodes correspond to a subset
of the words in the vocabulary. Thus, they behave similarly to the dictionary
features while having the merit of continuous value output. Additionally, such
dictionaries can be grown iteratively with multiple rounds by adding expert
labels on top-scoring words to an initial collection of the keywords. Also, the
same classifiers can be applied to aligned word embeddings in other languages
to obtain corresponding dictionaries. In our experiments, we have shown that
initializing higher-order networks with these classifier weights gives more
accurate models for downstream NLP tasks. We further demonstrate the usefulness
of supervised dimensions in revealing the polysemous nature of a keyword of
interest by projecting it's embedding using learned classifiers in different
sub-spaces.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Multi hash embeddings in spaCy [1.6790532021482656]
spaCy is a machine learning system that generates multi-embedding representations of words.
The default embedding layer in spaCy is a hash embeddings layer.
In this technical report we lay out a bit of history and introduce the embedding methods in spaCy in detail.
arXiv Detail & Related papers (2022-12-19T06:03:04Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Extending Multi-Sense Word Embedding to Phrases and Sentences for
Unsupervised Semantic Applications [34.71597411512625]
We propose a novel embedding method for a text sequence (a phrase or a sentence) where each sequence is represented by a distinct set of codebook embeddings.
Our experiments show that the per-sentence codebook embeddings significantly improve the performances in unsupervised sentence similarity and extractive summarization benchmarks.
arXiv Detail & Related papers (2021-03-29T04:54:28Z) - SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z) - PBoS: Probabilistic Bag-of-Subwords for Generalizing Word Embedding [16.531103175919924]
We look into the task of emphgeneralizing word embeddings.
given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words.
We propose a model, along with an efficient algorithm, that simultaneously models subword segmentation and computes subword-based compositional word embedding.
arXiv Detail & Related papers (2020-10-21T08:11:08Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Using Holographically Compressed Embeddings in Question Answering [0.0]
This research employs holographic compression of pre-trained embeddings to represent a token, its part-of-speech, and named entity type.
The implementation, in a modified question answering recurrent deep learning network, shows that semantic relationships are preserved, and yields strong performance.
arXiv Detail & Related papers (2020-07-14T18:29:49Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.