Universal Sentence Representation Learning with Conditional Masked
Language Model
- URL: http://arxiv.org/abs/2012.14388v2
- Date: Tue, 29 Dec 2020 03:29:11 GMT
- Title: Universal Sentence Representation Learning with Conditional Masked
Language Model
- Authors: Ziyi Yang, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve
- Abstract summary: We present Conditional Masked Language Modeling (M) to effectively learn sentence representations.
Our English CMLM model achieves state-of-the-art performance on SentEval.
As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
- Score: 7.334766841801749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel training method, Conditional Masked Language
Modeling (CMLM), to effectively learn sentence representations on large scale
unlabeled corpora. CMLM integrates sentence representation learning into MLM
training by conditioning on the encoded vectors of adjacent sentences. Our
English CMLM model achieves state-of-the-art performance on SentEval, even
outperforming models learned using (semi-)supervised signals. As a fully
unsupervised learning method, CMLM can be conveniently extended to a broad
range of languages and domains. We find that a multilingual CMLM model
co-trained with bitext retrieval~(BR) and natural language inference~(NLI)
tasks outperforms the previous state-of-the-art multilingual models by a large
margin. We explore the same language bias of the learned representations, and
propose a principle component based approach to remove the language identifying
information from the representation while still retaining sentence semantics.
Related papers
- SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation [37.45387861441091]
We introduce SAM4MLLM, an innovative approach which integrates the Segment Anything Model (SAM) with Multi-Modal Large Language Models (MLLMs)
Our method enables MLLMs to learn pixel-level location information without requiring excessive modifications to the existing model architecture or adding specialized tokens.
It combines detailed visual information with the powerful expressive capabilities of large language models in a unified language-based manner without additional computational overhead in learning.
arXiv Detail & Related papers (2024-09-01T12:09:33Z) - Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM)
By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations.
Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z) - Unsupervised Improvement of Factual Knowledge in Language Models [4.5788796239850225]
Masked language modeling plays a key role in pretraining large language models.
We propose an approach for influencing pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-04T07:37:06Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training [21.934439663979663]
Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
arXiv Detail & Related papers (2022-02-28T09:51:32Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - DICT-MLM: Improved Multilingual Pre-Training using Bilingual
Dictionaries [8.83363871195679]
Masked modeling (MLM) objective as key language learning objective.
DICT-MLM works by incentivizing the model to be able to predict not just the original masked word, but potentially any of its cross-lingual synonyms as well.
Our empirical analysis on multiple downstream tasks spanning 30+ languages, demonstrates the efficacy of the proposed approach.
arXiv Detail & Related papers (2020-10-23T17:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.