Word Rotator's Distance
- URL: http://arxiv.org/abs/2004.15003v3
- Date: Mon, 16 Nov 2020 17:57:08 GMT
- Title: Word Rotator's Distance
- Authors: Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui
- Abstract summary: Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment.
We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity.
We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
- Score: 50.67809662270474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key principle in assessing textual similarity is measuring the degree of
semantic overlap between two texts by considering the word alignment. Such
alignment-based approaches are intuitive and interpretable; however, they are
empirically inferior to the simple cosine similarity between general-purpose
sentence vectors. To address this issue, we focus on and demonstrate the fact
that the norm of word vectors is a good proxy for word importance, and their
angle is a good proxy for word similarity. Alignment-based approaches do not
distinguish them, whereas sentence-vector approaches automatically use the norm
as the word importance. Accordingly, we propose a method that first decouples
word vectors into their norm and direction, and then computes alignment-based
similarity using earth mover's distance (i.e., optimal transport cost), which
we refer to as word rotator's distance. Besides, we find how to grow the norm
and direction of word vectors (vector converter), which is a new systematic
approach derived from sentence-vector estimation methods. On several textual
similarity datasets, the combination of these simple proposed methods
outperformed not only alignment-based approaches but also strong baselines. The
source code is available at https://github.com/eumesy/wrd
Related papers
- Contextualized Word Vector-based Methods for Discovering Semantic
Differences with No Training nor Word Alignment [17.229611956178818]
We propose methods for discovering semantic differences in words appearing in two corpora.
The key idea is that the coverage of meanings is reflected in the norm of its mean word vector.
We show these advantages for native and non-native English corpora and also for historical corpora.
arXiv Detail & Related papers (2023-05-19T08:27:17Z) - Tsetlin Machine Embedding: Representing Words Using Logical Expressions [10.825099126920028]
We introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised.
The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee"
We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks.
arXiv Detail & Related papers (2023-01-02T15:02:45Z) - Improving word mover's distance by leveraging self-attention matrix [7.934452214142754]
The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences.
Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity.
arXiv Detail & Related papers (2022-11-11T14:25:08Z) - Describing Sets of Images with Textual-PCA [89.46499914148993]
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.
Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases.
arXiv Detail & Related papers (2022-10-21T17:10:49Z) - Optimizing Bi-Encoder for Named Entity Recognition via Contrastive
Learning [80.36076044023581]
We present an efficient bi-encoder framework for named entity recognition (NER)
We frame NER as a metric learning problem that maximizes the similarity between the vector representations of an entity mention and its type.
A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions.
arXiv Detail & Related papers (2022-08-30T23:19:04Z) - Simple, Interpretable and Stable Method for Detecting Words with Usage
Change across Corpora [54.757845511368814]
The problem of comparing two bodies of text and searching for words that differ in their usage arises often in digital humanities and computational social science.
This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large.
We propose an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word.
arXiv Detail & Related papers (2021-12-28T23:46:00Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Wasserstein Distance Regularized Sequence Representation for Text
Matching in Asymmetrical Domains [51.91456788949489]
We propose a novel match method tailored for text matching in asymmetrical domains, called WD-Match.
In WD-Match, a Wasserstein distance-based regularizer is defined to regularize the features vectors projected from different domains.
The training process of WD-Match amounts to a game that minimizes the matching loss regularized by the Wasserstein distance.
arXiv Detail & Related papers (2020-10-15T12:52:09Z) - Principal Word Vectors [5.64434321651888]
We generalize principal component analysis for embedding words into a vector space.
We show that the spread and the discriminability of the principal word vectors are higher than that of other word embedding methods.
arXiv Detail & Related papers (2020-07-09T08:29:57Z) - Discovering linguistic (ir)regularities in word embeddings through
max-margin separating hyperplanes [0.0]
We show new methods for learning how related words are positioned relative to each other in word embedding spaces.
Our model, SVMCos, is robust to a range of experimental choices when training word embeddings.
arXiv Detail & Related papers (2020-03-07T20:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.