SynWMD: Syntax-aware Word Mover's Distance for Sentence Similarity
Evaluation
- URL: http://arxiv.org/abs/2206.10029v1
- Date: Mon, 20 Jun 2022 22:30:07 GMT
- Title: SynWMD: Syntax-aware Word Mover's Distance for Sentence Similarity
Evaluation
- Authors: Chengwei Wei, Bin Wang, C.-C. Jay Kuo
- Abstract summary: Word Mover's Distance (WMD) computes the distance between words and models text similarity with the moving cost between words in two text sequences.
An improved WMD method using the syntactic parse tree, called Syntax-aware Word Mover's Distance (SynWMD), is proposed to address these two shortcomings in this work.
- Score: 36.5590780726458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word Mover's Distance (WMD) computes the distance between words and models
text similarity with the moving cost between words in two text sequences. Yet,
it does not offer good performance in sentence similarity evaluation since it
does not incorporate word importance and fails to take inherent contextual and
structural information in a sentence into account. An improved WMD method using
the syntactic parse tree, called Syntax-aware Word Mover's Distance (SynWMD),
is proposed to address these two shortcomings in this work. First, a weighted
graph is built upon the word co-occurrence statistics extracted from the
syntactic parse trees of sentences. The importance of each word is inferred
from graph connectivities. Second, the local syntactic parsing structure of
words is considered in computing the distance between words. To demonstrate the
effectiveness of the proposed SynWMD, we conduct experiments on 6 textual
semantic similarity (STS) datasets and 4 sentence classification datasets.
Experimental results show that SynWMD achieves state-of-the-art performance on
STS tasks. It also outperforms other WMD-based methods on sentence
classification tasks.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Improving word mover's distance by leveraging self-attention matrix [7.934452214142754]
The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences.
Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity.
arXiv Detail & Related papers (2022-11-11T14:25:08Z) - Moving Other Way: Exploring Word Mover Distance Extensions [7.195824023358536]
The word mover's distance (WMD) is a popular semantic similarity metric for two texts.
This paper studies several possible extensions of WMD.
arXiv Detail & Related papers (2022-02-07T12:56:32Z) - WMDecompose: A Framework for Leveraging the Interpretable Properties of
Word Mover's Distance in Sociocultural Analysis [0.0]
One popular model that balances legibility and interpretability is Word Mover's Distance (WMD)
We introduce WMDecompose: a model and Python library that decomposes document-level distances into their constituent word-level distances, and subsequently clusters words to induce thematic elements.
arXiv Detail & Related papers (2021-10-14T13:04:38Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - Incorporate Semantic Structures into Machine Translation Evaluation via
UCCA [9.064153799336536]
We define words carrying important semantic meanings in sentences as semantic core words.
We propose an MT evaluation approach named Semantically Weighted Sentence Similarity (SWSS)
arXiv Detail & Related papers (2020-10-17T06:47:58Z) - Text classification with word embedding regularization and soft
similarity measure [0.20999222360659603]
Two word embedding regularization techniques were shown to reduce storage and memory costs, and to improve training speed, document processing speed, and task performance.
We show 39% average $k$NN test error reduction with regularized word embeddings compared to non-regularized word embeddings.
We also show that the SCM with regularized word embeddings significantly outperforms the WMD on text classification and is over 10,000 times faster.
arXiv Detail & Related papers (2020-03-10T22:07:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.