Re-evaluating Word Mover's Distance
- URL: http://arxiv.org/abs/2105.14403v1
- Date: Sun, 30 May 2021 01:35:03 GMT
- Title: Re-evaluating Word Mover's Distance
- Authors: Ryoma Sato, Makoto Yamada, Hisashi Kashima
- Abstract summary: Original study on word mover's distance (WMD) reported that WMD outperforms classical baselines.
We re-evaluate the performances of WMD and the classical baselines.
We find that WMD in high-dimensional spaces behaves more similarly to BOW than in low-dimensional spaces due to the curse of dimensionality.
- Score: 42.922307642413244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The word mover's distance (WMD) is a fundamental technique for measuring the
similarity of two documents. As the crux of WMD, it can take advantage of the
underlying geometry of the word space by employing an optimal transport
formulation. The original study on WMD reported that WMD outperforms classical
baselines such as bag-of-words (BOW) and TF-IDF by significant margins in
various datasets. In this paper, we point out that the evaluation in the
original study could be misleading. We re-evaluate the performances of WMD and
the classical baselines and find that the classical baselines are competitive
with WMD if we employ an appropriate preprocessing, i.e., L1 normalization.
However, this result is not intuitive. WMD should be superior to BOW because
WMD can take the underlying geometry into account, whereas BOW cannot. Our
analysis shows that this is due to the high-dimensional nature of the
underlying metric. We find that WMD in high-dimensional spaces behaves more
similarly to BOW than in low-dimensional spaces due to the curse of
dimensionality.
Related papers
- Detecting Machine-Generated Texts by Multi-Population Aware Optimization
for Maximum Mean Discrepancy [47.382793714455445]
Machine-generated texts (MGTs) may carry critical risks, such as plagiarism, misleading information, or hallucination issues.
It is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle.
We propose a novel textitmulti-population aware optimization method for MMD called MMD-MP.
arXiv Detail & Related papers (2024-02-25T09:44:56Z) - Measuring the Robustness of NLP Models to Domain Shifts [50.89876374569385]
Existing research on Domain Robustness (DR) suffers from disparate setups, limited task variety, and scarce research on recent capabilities such as in-context learning.
Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation.
We argue that the Target Drop (TD), which measures degradation from the target in-domain performance, should be used as a complementary point of view.
arXiv Detail & Related papers (2023-05-31T20:25:08Z) - Improving word mover's distance by leveraging self-attention matrix [7.934452214142754]
The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences.
Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity.
arXiv Detail & Related papers (2022-11-11T14:25:08Z) - Disentangled Modeling of Domain and Relevance for Adaptable Dense
Retrieval [54.349418995689284]
We propose a novel Dense Retrieval (DR) framework named Disentangled Dense Retrieval ( DDR) to support effective domain adaptation for DR models.
By making the REM and DAMs disentangled, DDR enables a flexible training paradigm in which REM is trained with supervision once and DAMs are trained with unsupervised data.
DDR significantly improves ranking performance compared to strong DR baselines and substantially outperforms traditional retrieval methods in most scenarios.
arXiv Detail & Related papers (2022-08-11T11:18:50Z) - Moving Other Way: Exploring Word Mover Distance Extensions [7.195824023358536]
The word mover's distance (WMD) is a popular semantic similarity metric for two texts.
This paper studies several possible extensions of WMD.
arXiv Detail & Related papers (2022-02-07T12:56:32Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Non-Parametric Few-Shot Learning for Word Sense Disambiguation [11.175893018731712]
MetricWSD is a non-parametric few-shot learning approach to mitigate this data imbalance issue.
By learning to compute distances among the senses of a given word through episodic training, MetricWSD transfers knowledge from high-frequency words to infrequent ones.
arXiv Detail & Related papers (2021-04-26T16:08:46Z) - MMD-Regularized Unbalanced Optimal Transport [0.0]
We study the unbalanced optimal transport (UOT) problem, where the marginal constraints are enforced using Maximum Mean Discrepancy (MMD) regularization.
Our work is motivated by the observation that the literature on UOT is focused on regularization based on $phi$-divergence.
Despite the popularity of MMD, its role as a regularizer in the context of UOT seems less understood.
arXiv Detail & Related papers (2020-11-10T09:32:50Z) - Rethink Maximum Mean Discrepancy for Domain Adaptation [77.2560592127872]
This paper theoretically proves two essential facts: 1) minimizing the Maximum Mean Discrepancy equals to maximize the source and target intra-class distances respectively but jointly minimize their variance with some implicit weights, so that the feature discriminability degrades.
Experiments on several benchmark datasets not only prove the validity of theoretical results but also demonstrate that our approach could perform better than the comparative state-of-art methods substantially.
arXiv Detail & Related papers (2020-07-01T18:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.