Related papers: Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

URL: http://arxiv.org/abs/2003.03654v1
Date: Sat, 7 Mar 2020 20:21:50 GMT
Title: Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes
Authors: Noel Kennedy, Imogen Schofield, Dave C. Brodbelt, David B. Church, Dan G. O'Neill
Abstract summary: We show new methods for learning how related words are positioned relative to each other in word embedding spaces. Our model, SVMCos, is robust to a range of experimental choices when training word embeddings.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We experiment with new methods for learning how related words are positioned relative to each other in word embedding spaces. Previous approaches learned constant vector offsets: vectors that point from source tokens to target tokens with an assumption that these offsets were parallel to each other. We show that the offsets between related tokens are closer to orthogonal than parallel, and that they have low cosine similarities. We proceed by making a different assumption; target tokens are linearly separable from source and un-labeled tokens. We show that a max-margin hyperplane can separate target tokens and that vectors orthogonal to this hyperplane represent the relationship between source and targets. We find that this representation of the relationship obtains the best results in dis-covering linguistic regularities. We experiment with vector space models trained by a variety of algorithms (Word2vec: CBOW/skip-gram, fastText, or GloVe), and various word context choices such as linear word-order, syntax dependency grammars, and with and without knowledge of word position. These experiments show that our model, SVMCos, is robust to a range of experimental choices when training word embeddings.

Related papers

Tsetlin Machine Embedding: Representing Words Using Logical Expressions [10.825099126920028]
We introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee" We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks.
arXiv Detail & Related papers (2023-01-02T15:02:45Z)
Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings. RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z)
More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ. We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z)
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences. We introduce denoising word alignment as a new cross-lingual pre-training task. Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z)
Learning to Remove: Towards Isotropic Pre-trained BERT Embedding [7.765987411382461]
Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks. We measure and analyze the geometry of pre-trained BERT embedding and find that it is far from isotropic. We propose a simple, and yet effective method to fix this problem: remove several dominant directions of BERT embedding with a set of learnable weights.
arXiv Detail & Related papers (2021-04-12T08:13:59Z)
SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices. We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z)
Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations. We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z)
Supervised Understanding of Word Embeddings [1.160208922584163]
We have obtained supervised projections in the form of the linear keyword-level classifiers on word embeddings. We have shown that the method creates interpretable projections of original embedding dimensions.
arXiv Detail & Related papers (2020-06-23T20:13:42Z)
Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport [14.86310501896212]
In this work, we extend this selective rationalization approach to text matching. The goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs.
arXiv Detail & Related papers (2020-05-27T01:20:49Z)
Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.