MCSE: Multimodal Contrastive Learning of Sentence Embeddings
- URL: http://arxiv.org/abs/2204.10931v1
- Date: Fri, 22 Apr 2022 21:19:24 GMT
- Title: MCSE: Multimodal Contrastive Learning of Sentence Embeddings
- Authors: Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A.
Hedderich, Dietrich Klakow
- Abstract summary: We propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective.
We show that our approach consistently improves the performance across various datasets and pre-trained encoders.
- Score: 23.630041603311923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning semantically meaningful sentence embeddings is an open problem in
natural language processing. In this work, we propose a sentence embedding
learning approach that exploits both visual and textual information via a
multimodal contrastive objective. Through experiments on a variety of semantic
textual similarity tasks, we demonstrate that our approach consistently
improves the performance across various datasets and pre-trained encoders. In
particular, combining a small amount of multimodal data with a large text-only
corpus, we improve the state-of-the-art average Spearman's correlation by 1.7%.
By analyzing the properties of the textual embedding space, we show that our
model excels in aligning semantically similar sentences, providing an
explanation for its improved performance.
Related papers
- Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs [13.922091192207718]
This research aims to analyze the relationship among sentences, visuals, and emoticons.
We have proposed a novel contrastive learning based multimodal architecture.
The proposed model attained an accuracy of 91% and an MCC-score of 90% while assessing emoticons.
arXiv Detail & Related papers (2024-08-05T15:45:59Z) - Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Composition-contrastive Learning for Sentence Embeddings [23.85590618900386]
This work is the first to do so without incurring costs in auxiliary training objectives or additional network parameters.
Experimental results on semantic textual similarity tasks show improvements over baselines that are comparable with state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-14T14:39:35Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Analysis of Joint Speech-Text Embeddings for Semantic Matching [3.6423306784901235]
We study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs.
We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios.
arXiv Detail & Related papers (2022-04-04T04:50:32Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.