Word Rotator's Distance
- URL: http://arxiv.org/abs/2004.15003v3
- Date: Mon, 16 Nov 2020 17:57:08 GMT
- Title: Word Rotator's Distance
- Authors: Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui
- Abstract summary: Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment.
We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity.
We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
- Score: 50.67809662270474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key principle in assessing textual similarity is measuring the degree of
semantic overlap between two texts by considering the word alignment. Such
alignment-based approaches are intuitive and interpretable; however, they are
empirically inferior to the simple cosine similarity between general-purpose
sentence vectors. To address this issue, we focus on and demonstrate the fact
that the norm of word vectors is a good proxy for word importance, and their
angle is a good proxy for word similarity. Alignment-based approaches do not
distinguish them, whereas sentence-vector approaches automatically use the norm
as the word importance. Accordingly, we propose a method that first decouples
word vectors into their norm and direction, and then computes alignment-based
similarity using earth mover's distance (i.e., optimal transport cost), which
we refer to as word rotator's distance. Besides, we find how to grow the norm
and direction of word vectors (vector converter), which is a new systematic
approach derived from sentence-vector estimation methods. On several textual
similarity datasets, the combination of these simple proposed methods
outperformed not only alignment-based approaches but also strong baselines. The
source code is available at https://github.com/eumesy/wrd
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.