Comparing in context: Improving cosine similarity measures with a metric
tensor
- URL: http://arxiv.org/abs/2203.14996v1
- Date: Mon, 28 Mar 2022 18:04:26 GMT
- Title: Comparing in context: Improving cosine similarity measures with a metric
tensor
- Authors: Isa M. Apallius de Vos, Ghislaine L. van den Boogerd, Mara D. Fennema,
Adriana D. Correia
- Abstract summary: Cosine similarity is a widely used measure of the relatedness of pre-trained word embeddings, trained on a language modeling goal.
We propose instead the use of an extended cosine similarity measure to improve performance on that task, with gains in interpretability.
We learn contextualized metrics and compare the results with the baseline values obtained using the standard cosine similarity measure, which consistently shows improvement.
We also train a contextualized similarity measure for both SimLex-999 and WordSim-353, comparing the results with the corresponding baselines, and using these datasets as independent test sets for the all-context similarity measure learned on
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cosine similarity is a widely used measure of the relatedness of pre-trained
word embeddings, trained on a language modeling goal. Datasets such as
WordSim-353 and SimLex-999 rate how similar words are according to human
annotators, and as such are often used to evaluate the performance of language
models. Thus, any improvement on the word similarity task requires an improved
word representation. In this paper, we propose instead the use of an extended
cosine similarity measure to improve performance on that task, with gains in
interpretability. We explore the hypothesis that this approach is particularly
useful if the word-similarity pairs share the same context, for which distinct
contextualized similarity measures can be learned. We first use the dataset of
Richie et al. (2020) to learn contextualized metrics and compare the results
with the baseline values obtained using the standard cosine similarity measure,
which consistently shows improvement. We also train a contextualized similarity
measure for both SimLex-999 and WordSim-353, comparing the results with the
corresponding baselines, and using these datasets as independent test sets for
the all-context similarity measure learned on the contextualized dataset,
obtaining positive results for a number of tests.
Related papers
- Semantic similarity prediction is better than other semantic similarity
measures [5.176134438571082]
We argue that when we are only interested in measuring the semantic similarity, it is better to directly predict the similarity using a fine-tuned model for such a task.
Using a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B) from the GLUE benchmark, we define the STSScore approach and show that the resulting similarity is better aligned with our expectations on a robust semantic similarity measure than other approaches.
arXiv Detail & Related papers (2023-09-22T08:11:01Z) - Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution [0.0]
No single semantic similarity measure is the most appropriate for all tasks, and researchers often use ensemble strategies to ensure performance.
This research work proposes a method for automatically designing semantic similarity ensembles.
Our proposed method uses grammatical evolution, for the first time, to automatically select and aggregate measures from a pool of candidates to create an ensemble that maximizes correlation to human judgment.
arXiv Detail & Related papers (2023-07-03T10:53:05Z) - Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content.
We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z) - ContraSim -- A Similarity Measure Based on Contrastive Learning [28.949004915740776]
We develop a new similarity measure, dubbed ContraSim, based on contrastive learning.
ContraSim learns a parameterized measure by using both similar and dissimilar examples.
In all cases, ContraSim achieves much higher accuracy than previous similarity measures.
arXiv Detail & Related papers (2023-03-29T19:43:26Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Problems with Cosine as a Measure of Embedding Similarity for High
Frequency Words [45.58634797899206]
We find that cosine similarity underestimates the similarity of frequent words with other instances of the same word or other words across contexts.
We conjecture that this underestimation of similarity for high frequency words is due to differences in the representational geometry of high and low frequency words.
arXiv Detail & Related papers (2022-05-10T18:00:06Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric [48.66580267438049]
We present FastKASSIM, a metric for utterance- and document-level syntactic similarity.
It pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels.
It runs up to to 5.2 times faster than our baseline method over the documents in the r/ChangeMyView corpus.
arXiv Detail & Related papers (2022-03-15T22:33:26Z) - Semantic Answer Similarity for Evaluating Question Answering Models [2.279676596857721]
SAS is a cross-encoder-based metric for the estimation of semantic answer similarity.
We show that semantic similarity metrics based on recent transformer models correlate much better with human judgment than traditional lexical similarity metrics.
arXiv Detail & Related papers (2021-08-13T09:12:27Z) - Exploiting Non-Taxonomic Relations for Measuring Semantic Similarity and
Relatedness in WordNet [0.0]
This paper explores the benefits of using all types of non-taxonomic relations in large linked data, such as WordNet knowledge graph.
We propose a holistic poly-relational approach based on a new relation-based information content and non-taxonomic-based weighted paths.
arXiv Detail & Related papers (2020-06-22T09:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.