Discovering linguistic (ir)regularities in word embeddings through
max-margin separating hyperplanes
- URL: http://arxiv.org/abs/2003.03654v1
- Date: Sat, 7 Mar 2020 20:21:50 GMT
- Title: Discovering linguistic (ir)regularities in word embeddings through
max-margin separating hyperplanes
- Authors: Noel Kennedy, Imogen Schofield, Dave C. Brodbelt, David B. Church, Dan
G. O'Neill
- Abstract summary: We show new methods for learning how related words are positioned relative to each other in word embedding spaces.
Our model, SVMCos, is robust to a range of experimental choices when training word embeddings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We experiment with new methods for learning how related words are positioned
relative to each other in word embedding spaces. Previous approaches learned
constant vector offsets: vectors that point from source tokens to target tokens
with an assumption that these offsets were parallel to each other. We show that
the offsets between related tokens are closer to orthogonal than parallel, and
that they have low cosine similarities. We proceed by making a different
assumption; target tokens are linearly separable from source and un-labeled
tokens. We show that a max-margin hyperplane can separate target tokens and
that vectors orthogonal to this hyperplane represent the relationship between
source and targets. We find that this representation of the relationship
obtains the best results in dis-covering linguistic regularities. We experiment
with vector space models trained by a variety of algorithms (Word2vec:
CBOW/skip-gram, fastText, or GloVe), and various word context choices such as
linear word-order, syntax dependency grammars, and with and without knowledge
of word position. These experiments show that our model, SVMCos, is robust to a
range of experimental choices when training word embeddings.
Related papers
- Tsetlin Machine Embedding: Representing Words Using Logical Expressions [10.825099126920028]
We introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised.
The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee"
We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks.
arXiv Detail & Related papers (2023-01-02T15:02:45Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences.
We introduce denoising word alignment as a new cross-lingual pre-training task.
Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z) - Learning to Remove: Towards Isotropic Pre-trained BERT Embedding [7.765987411382461]
Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks.
We measure and analyze the geometry of pre-trained BERT embedding and find that it is far from isotropic.
We propose a simple, and yet effective method to fix this problem: remove several dominant directions of BERT embedding with a set of learnable weights.
arXiv Detail & Related papers (2021-04-12T08:13:59Z) - SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - Supervised Understanding of Word Embeddings [1.160208922584163]
We have obtained supervised projections in the form of the linear keyword-level classifiers on word embeddings.
We have shown that the method creates interpretable projections of original embedding dimensions.
arXiv Detail & Related papers (2020-06-23T20:13:42Z) - Rationalizing Text Matching: Learning Sparse Alignments via Optimal
Transport [14.86310501896212]
In this work, we extend this selective rationalization approach to text matching.
The goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction.
Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs.
arXiv Detail & Related papers (2020-05-27T01:20:49Z) - Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment.
We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity.
We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.