Style-transfer and Paraphrase: Looking for a Sensible Semantic
Similarity Metric
- URL: http://arxiv.org/abs/2004.05001v3
- Date: Thu, 3 Dec 2020 21:58:57 GMT
- Title: Style-transfer and Paraphrase: Looking for a Sensible Semantic
Similarity Metric
- Authors: Ivan P. Yamshchikov, Viacheslav Shibaev, Nikolay Khlebnikov, Alexey
Tikhonov
- Abstract summary: We show that none of the metrics widely used in the literature is close enough to human judgment in these tasks.
A number of recently proposed metrics provide comparable results, yet Word Mover Distance is shown to be the most reasonable solution.
- Score: 18.313879914379005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of such natural language processing tasks as style
transfer, paraphrase, and machine translation often calls for the use of
semantic similarity metrics. In recent years a lot of methods to measure the
semantic similarity of two short texts were developed. This paper provides a
comprehensive analysis for more than a dozen of such methods. Using a new
dataset of fourteen thousand sentence pairs human-labeled according to their
semantic similarity, we demonstrate that none of the metrics widely used in the
literature is close enough to human judgment in these tasks. A number of
recently proposed metrics provide comparable results, yet Word Mover Distance
is shown to be the most reasonable solution to measure semantic similarity in
reformulated texts at the moment.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - A Comparative Study of Sentence Embedding Models for Assessing Semantic
Variation [0.0]
We compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature.
We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.
arXiv Detail & Related papers (2023-08-08T23:31:10Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Retrofitting Multilingual Sentence Embeddings with Abstract Meaning
Representation [70.58243648754507]
We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR)
Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously.
Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic similarity and transfer tasks.
arXiv Detail & Related papers (2022-10-18T11:37:36Z) - Measuring Fine-Grained Semantic Equivalence with Abstract Meaning
Representation [9.666975331506812]
Identifying semantically equivalent sentences is important for many NLP tasks.
Current approaches to semantic equivalence take a loose, sentence-level approach to "equivalence"
We introduce a novel, more sensitive method of characterizing semantic equivalence that leverages Abstract Meaning Representation graph structures.
arXiv Detail & Related papers (2022-10-06T16:08:27Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - MuSeM: Detecting Incongruent News Headlines using Mutual Attentive
Semantic Matching [7.608480381965392]
Measuring the congruence between two texts has several useful applications, such as detecting deceptive and misleading news headlines on the web.
This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines.
We observe that the proposed method outperforms prior arts significantly for two publicly available datasets.
arXiv Detail & Related papers (2020-10-07T19:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.