Related papers: Improving word mover's distance by leveraging self-attention matrix

Improving word mover's distance by leveraging self-attention matrix

URL: http://arxiv.org/abs/2211.06229v2
Date: Thu, 2 Nov 2023 15:58:47 GMT
Title: Improving word mover's distance by leveraging self-attention matrix
Authors: Hiroaki Yamagiwa, Sho Yokoi, Hidetoshi Shimodaira
Abstract summary: The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity.
Score: 7.934452214142754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT's self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity. Our code is available at \url{https://github.com/ymgw55/WSMD}.

Related papers

Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining [0.22499166814992438]
We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector is not sufficient for effective phrase retrieval. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations.
arXiv Detail & Related papers (2024-05-12T12:08:05Z)
Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations [80.45474362071236]
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space. We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings.
arXiv Detail & Related papers (2023-05-24T00:44:49Z)
Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings. RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z)
Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation [70.58243648754507]
We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR) Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously. Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic similarity and transfer tasks.
arXiv Detail & Related papers (2022-10-18T11:37:36Z)
SynWMD: Syntax-aware Word Mover's Distance for Sentence Similarity Evaluation [36.5590780726458]
Word Mover's Distance (WMD) computes the distance between words and models text similarity with the moving cost between words in two text sequences. An improved WMD method using the syntactic parse tree, called Syntax-aware Word Mover's Distance (SynWMD), is proposed to address these two shortcomings in this work.
arXiv Detail & Related papers (2022-06-20T22:30:07Z)
Moving Other Way: Exploring Word Mover Distance Extensions [7.195824023358536]
The word mover's distance (WMD) is a popular semantic similarity metric for two texts. This paper studies several possible extensions of WMD.
arXiv Detail & Related papers (2022-02-07T12:56:32Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
Learning to Remove: Towards Isotropic Pre-trained BERT Embedding [7.765987411382461]
Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks. We measure and analyze the geometry of pre-trained BERT embedding and find that it is far from isotropic. We propose a simple, and yet effective method to fix this problem: remove several dominant directions of BERT embedding with a set of learnable weights.
arXiv Detail & Related papers (2021-04-12T08:13:59Z)
SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change [58.87961226278285]
This paper describes SChME, a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature.
arXiv Detail & Related papers (2020-12-02T23:56:34Z)
Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z)
Text classification with word embedding regularization and soft similarity measure [0.20999222360659603]
Two word embedding regularization techniques were shown to reduce storage and memory costs, and to improve training speed, document processing speed, and task performance. We show 39% average $k$NN test error reduction with regularized word embeddings compared to non-regularized word embeddings. We also show that the SCM with regularized word embeddings significantly outperforms the WMD on text classification and is over 10,000 times faster.
arXiv Detail & Related papers (2020-03-10T22:07:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.