Non-Linguistic Supervision for Contrastive Learning of Sentence
Embeddings
- URL: http://arxiv.org/abs/2209.09433v1
- Date: Tue, 20 Sep 2022 03:01:45 GMT
- Title: Non-Linguistic Supervision for Contrastive Learning of Sentence
Embeddings
- Authors: Yiren Jian and Chongyang Gao and Soroush Vosoughi
- Abstract summary: We find the performance of Transformer models as sentence encoders can be improved by training with multi-modal multi-task losses.
The reliance of our framework on unpaired non-linguistic data makes it language-agnostic, enabling it to be widely applicable beyond English NLP.
- Score: 14.244787327283335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic representation learning for sentences is an important and
well-studied problem in NLP. The current trend for this task involves training
a Transformer-based sentence encoder through a contrastive objective with text,
i.e., clustering sentences with semantically similar meanings and scattering
others. In this work, we find the performance of Transformer models as sentence
encoders can be improved by training with multi-modal multi-task losses, using
unpaired examples from another modality (e.g., sentences and unrelated
image/audio data). In particular, besides learning by the contrastive loss on
text, our model clusters examples from a non-linguistic domain (e.g.,
visual/audio) with a similar contrastive loss at the same time. The reliance of
our framework on unpaired non-linguistic data makes it language-agnostic,
enabling it to be widely applicable beyond English NLP. Experiments on 7
semantic textual similarity benchmarks reveal that models trained with the
additional non-linguistic (images/audio) contrastive objective lead to higher
quality sentence embeddings. This indicates that Transformer models are able to
generalize better by doing a similar task (i.e., clustering) with unpaired
examples from different modalities in a multi-task fashion.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - SenTest: Evaluating Robustness of Sentence Encoders [0.4194295877935868]
This work focuses on evaluating the robustness of the sentence encoders.
We employ several adversarial attacks to evaluate its robustness.
The results of the experiments strongly undermine the robustness of sentence encoders.
arXiv Detail & Related papers (2023-11-29T15:21:35Z) - ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation [59.91139600152296]
ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
arXiv Detail & Related papers (2023-05-26T02:27:33Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing.
Our method fine-tunes a pre-trained language model to generate adversarial examples.
We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z) - CLINE: Contrastive Learning with Semantic Negative Examples for Natural
Language Understanding [35.003401250150034]
We propose Contrastive Learning with semantIc Negative Examples (CLINE) to improve robustness of pre-trained language models.
CLINE constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking.
Empirical results show that our approach yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks.
arXiv Detail & Related papers (2021-07-01T13:34:12Z) - Disentangled Contrastive Learning for Learning Robust Textual
Representations [13.880693856907037]
We introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity.
Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines.
arXiv Detail & Related papers (2021-04-11T03:32:49Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.