How Small Transformation Expose the Weakness of Semantic Similarity Measures
- URL: http://arxiv.org/abs/2509.09714v1
- Date: Mon, 08 Sep 2025 11:00:18 GMT
- Title: How Small Transformation Expose the Weakness of Semantic Similarity Measures
- Authors: Serge Lionel Nikiema, Albérick Euraste Djire, Abdoul Aziz Bonkoungou, Micheline Bénédicte Moumoula, Jordan Samhi, Abdoul Kader Kabore, Jacques Klein, Tegawendé F. Bissyande,
- Abstract summary: The study tested 18 different similarity measurement approaches.<n>Some embedding-based methods incorrectly identified semantic opposites as similar up to 99.9 percent of the time.<n>Certain transformer-based approaches occasionally rated opposite meanings as more similar than synonymous ones.
- Score: 9.554744625391512
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This research examines how well different methods measure semantic similarity, which is important for various software engineering applications such as code search, API recommendations, automated code reviews, and refactoring tools. While large language models are increasingly used for these similarity assessments, questions remain about whether they truly understand semantic relationships or merely recognize surface patterns. The study tested 18 different similarity measurement approaches, including word-based methods, embedding techniques, LLM-based systems, and structure-aware algorithms. The researchers created a systematic testing framework that applies controlled changes to text and code to evaluate how well each method handles different types of semantic relationships. The results revealed significant issues with commonly used metrics. Some embedding-based methods incorrectly identified semantic opposites as similar up to 99.9 percent of the time, while certain transformer-based approaches occasionally rated opposite meanings as more similar than synonymous ones. The study found that embedding methods' poor performance often stemmed from how they calculate distances; switching from Euclidean distance to cosine similarity improved results by 24 to 66 percent. LLM-based approaches performed better at distinguishing semantic differences, producing low similarity scores (0.00 to 0.29) for genuinely different meanings, compared to embedding methods that incorrectly assigned high scores (0.82 to 0.99) to dissimilar content.
Related papers
- A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection [0.0]
Similarity-based techniques enable approximate matching, allowing related byte sequences to produce measurably similar fingerprints.<n>Security researchers have proposed a range of approaches, including similarity digests and locality-sensitive hashes.<n>This paper presents a systematic comparison of learning-based classification and similarity methods using large, publicly available datasets.
arXiv Detail & Related papers (2026-02-17T06:16:23Z) - Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity [42.873412319680035]
This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for Large Language Models outputs.<n>We generate benchmark datasets in four different domains (general knowledge, biomedicine, finance, biology)<n>We observe that the sub-type of semantic variation, as well as the domain of the benchmark impact the performance of semantic similarity methods.
arXiv Detail & Related papers (2025-11-25T05:07:08Z) - Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution [0.0]
This paper presents an automated strategy based on grammatical evolution for constructing semantic similarity ensembles.<n> Experiments on standard benchmark datasets demonstrate that the proposed approach outperforms existing ensemble techniques in terms of accuracy.
arXiv Detail & Related papers (2023-07-03T10:53:05Z) - Apport des ontologies pour le calcul de la similarit\'e s\'emantique au
sein d'un syst\`eme de recommandation [0.0]
Measurement of the semantic relatedness or likeness between terms, words, or text data plays an important role in different applications.
We propose and carry on an approach for the calculation of semantic similarity using in the context of a recommender system.
arXiv Detail & Related papers (2022-05-25T07:27:10Z) - Towards Interpretable Deep Metric Learning with Structural Matching [86.16700459215383]
We present a deep interpretable metric learning (DIML) method for more transparent embedding learning.
Our method is model-agnostic, which can be applied to off-the-shelf backbone networks and metric learning methods.
We evaluate our method on three major benchmarks of deep metric learning including CUB200-2011, Cars196, and Stanford Online Products.
arXiv Detail & Related papers (2021-08-12T17:59:09Z) - A novel hybrid methodology of measuring sentence similarity [0.0]
It is necessary to measure the similarity between sentences accurately.
Deep learning methodology shows a state-of-the-art performance in many natural language processing fields.
Considering the structure of the sentence or the word structure that makes up the sentence is also important.
arXiv Detail & Related papers (2021-05-03T06:50:54Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Towards Improved and Interpretable Deep Metric Learning via Attentive
Grouping [103.71992720794421]
Grouping has been commonly used in deep metric learning for computing diverse features.
We propose an improved and interpretable grouping method to be integrated flexibly with any metric learning framework.
arXiv Detail & Related papers (2020-11-17T19:08:24Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Provably Robust Metric Learning [98.50580215125142]
We show that existing metric learning algorithms can result in metrics that are less robust than the Euclidean distance.
We propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations.
Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors.
arXiv Detail & Related papers (2020-06-12T09:17:08Z) - Evolution of Semantic Similarity -- A Survey [8.873705500708196]
Estimating semantic similarity between text data is a challenging and open research problem in the field of Natural Language Processing (NLP)
Various semantic similarity methods have been proposed over the years to address this issue.
This survey article traces the evolution of such methods, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network-based methods, and hybrid methods.
arXiv Detail & Related papers (2020-04-19T22:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.