Latte-Mix: Measuring Sentence Semantic Similarity with Latent
Categorical Mixtures
- URL: http://arxiv.org/abs/2010.11351v1
- Date: Wed, 21 Oct 2020 23:45:18 GMT
- Title: Latte-Mix: Measuring Sentence Semantic Similarity with Latent
Categorical Mixtures
- Authors: M. Li, H. Bai, L. Tan, K. Xiong, M. Li, J. Lin
- Abstract summary: We learn a categorical variational autoencoder based on off-the-shelf pre-trained language models.
We empirically demonstrate that these finetuned models could be further improved by Latte-Mix.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring sentence semantic similarity using pre-trained language models such
as BERT generally yields unsatisfactory zero-shot performance, and one main
reason is ineffective token aggregation methods such as mean pooling. In this
paper, we demonstrate under a Bayesian framework that distance between
primitive statistics such as the mean of word embeddings are fundamentally
flawed for capturing sentence-level semantic similarity. To remedy this issue,
we propose to learn a categorical variational autoencoder (VAE) based on
off-the-shelf pre-trained language models. We theoretically prove that
measuring the distance between the latent categorical mixtures, namely
Latte-Mix, can better reflect the true sentence semantic similarity. In
addition, our Bayesian framework provides explanations for why models finetuned
on labelled sentence pairs have better zero-shot performance. We also
empirically demonstrate that these finetuned models could be further improved
by Latte-Mix. Our method not only yields the state-of-the-art zero-shot
performance on semantic similarity datasets such as STS, but also enjoy the
benefits of fast training and having small memory footprints.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Semantic similarity prediction is better than other semantic similarity
measures [5.176134438571082]
We argue that when we are only interested in measuring the semantic similarity, it is better to directly predict the similarity using a fine-tuned model for such a task.
Using a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B) from the GLUE benchmark, we define the STSScore approach and show that the resulting similarity is better aligned with our expectations on a robust semantic similarity measure than other approaches.
arXiv Detail & Related papers (2023-09-22T08:11:01Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Counting Like Human: Anthropoid Crowd Counting on Modeling the
Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results.
Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z) - Robust Textual Embedding against Word-level Adversarial Attacks [15.235449552083043]
We propose a novel robust training method, termed Fast Triplet Metric Learning (FTML)
We show that FTML can significantly promote the model robustness against various advanced adversarial attacks.
Our work shows the great potential of improving the textual robustness through robust word embedding.
arXiv Detail & Related papers (2022-02-28T14:25:00Z) - Semantic Answer Similarity for Evaluating Question Answering Models [2.279676596857721]
SAS is a cross-encoder-based metric for the estimation of semantic answer similarity.
We show that semantic similarity metrics based on recent transformer models correlate much better with human judgment than traditional lexical similarity metrics.
arXiv Detail & Related papers (2021-08-13T09:12:27Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - An end-to-end approach for the verification problem: learning the right
distance [15.553424028461885]
We augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder.
We first show it approximates a likelihood ratio which can be used for hypothesis tests.
We observe training is much simplified under the proposed approach compared to metric learning with actual distances.
arXiv Detail & Related papers (2020-02-21T18:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.