Efficient comparison of sentence embeddings
- URL: http://arxiv.org/abs/2204.00820v1
- Date: Sat, 2 Apr 2022 09:08:34 GMT
- Title: Efficient comparison of sentence embeddings
- Authors: Spyros Zoupanos, Stratis Kolovos, Athanasios Kanavos, Orestis
Papadimitriou, Manolis Maragoudakis
- Abstract summary: We will discuss about various word and sentence embeddings algorithms, we will select a sentence embedding algorithm, BERT, as our algorithm of choice.
According to the results, FAISS outperforms when used in a centralized environment with only one node, especially when big datasets are included.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The domain of natural language processing (NLP), which has greatly evolved
over the last years, has highly benefited from the recent developments in word
and sentence embeddings. Such embeddings enable the transformation of complex
NLP tasks, like semantic similarity or Question and Answering (Q\&A), into much
simpler to perform vector comparisons. However, such a problem transformation
raises new challenges like the efficient comparison of embeddings and their
manipulation. In this work, we will discuss about various word and sentence
embeddings algorithms, we will select a sentence embedding algorithm, BERT, as
our algorithm of choice and we will evaluate the performance of two vector
comparison approaches, FAISS and Elasticsearch, in the specific problem of
sentence embeddings. According to the results, FAISS outperforms Elasticsearch
when used in a centralized environment with only one node, especially when big
datasets are included.
Related papers
- Performance Evaluation and Comparison of a New Regression Algorithm [4.125187280299247]
We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms.
The reader is free to replicate our results since we have provided the source code in a GitHub repository.
arXiv Detail & Related papers (2023-06-15T13:01:16Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Will Bilevel Optimizers Benefit from Loops [63.22466953441521]
Two current popular bilevelimats AID-BiO and ITD-BiO naturally involve solving one or two sub-problems.
We first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops.
arXiv Detail & Related papers (2022-05-27T20:28:52Z) - Word Embeddings and Validity Indexes in Fuzzy Clustering [5.063728016437489]
fuzzy-based analysis of various vector representations of words, i.e., word embeddings.
We use two popular fuzzy clustering algorithms on count-based word embeddings, with different methods and dimensionality.
We evaluate results of experiments with various clustering validity indexes to compare different algorithm variation with different embeddings accuracy.
arXiv Detail & Related papers (2022-04-26T18:08:19Z) - Dictionary Learning Using Rank-One Atomic Decomposition (ROAD) [6.367823813868024]
Dictionary learning aims at seeking a dictionary under which the training data can be sparsely represented.
Road outperforms other benchmark algorithms for both synthetic data and real data.
arXiv Detail & Related papers (2021-10-25T10:29:52Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences [69.3939291118954]
This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces.
Results show that one method generates the most clusterable embeddings.
In general, the embeddings of span sub-sentences have better clustering properties than the original sentences.
arXiv Detail & Related papers (2021-10-02T00:47:35Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Fact-aware Sentence Split and Rephrase with Permutation Invariant
Training [93.66323661321113]
Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved.
Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs.
We introduce Permutation Training to verifies the effects of order variance in seq2seq learning for this task.
arXiv Detail & Related papers (2020-01-16T07:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.