Fast, Effective and Self-Supervised: Transforming Masked LanguageModels
into Universal Lexical and Sentence Encoders
- URL: http://arxiv.org/abs/2104.08027v1
- Date: Fri, 16 Apr 2021 10:49:56 GMT
- Title: Fast, Effective and Self-Supervised: Transforming Masked LanguageModels
into Universal Lexical and Sentence Encoders
- Authors: Fangyu Liu, Ivan Vuli\'c, Anna Korhonen, Nigel Collier
- Abstract summary: We show that it is possible to turn tasks into universal lexical and sentence encoders even without any additional data and without supervision.
We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT.
Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples.
We report huge gains over off-the-shelfs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages.
- Score: 66.76141128555099
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent
years. However, previous work has indicated that off-the-shelf MLMs are not
effective as universal lexical or sentence encoders without further
task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks
using annotated task data. In this work, we demonstrate that it is possible to
turn MLMs into effective universal lexical and sentence encoders even without
any additional data and without any supervision. We propose an extremely
simple, fast and effective contrastive learning technique, termed Mirror-BERT,
which converts MLMs (e.g., BERT and RoBERTa) into such encoders in less than a
minute without any additional external knowledge. Mirror-BERT relies on fully
identical or slightly modified string pairs as positive (i.e., synonymous)
fine-tuning examples, and aims to maximise their similarity during identity
fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in
both lexical-level and sentence-level tasks, across different domains and
different languages. Notably, in the standard sentence semantic similarity
(STS) tasks, our self-supervised Mirror-BERT model even matches the performance
of the task-tuned Sentence-BERT models from prior work. Finally, we delve
deeper into the inner workings of MLMs, and suggest some evidence on why this
simple approach can yield effective univeral lexical and sentence encoders.
Related papers
- Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings [69.35226485836641]
Excessive use of visual tokens in existing Multimoal Large Language Models (MLLMs) often exhibits obvious redundancy and brings in prohibitively expensive computation.
We propose a simple yet effective method to improve the efficiency of MLLMs, termed dynamic visual-token exit (DyVTE)
DyVTE uses lightweight hyper-networks to perceive the text token status and decide the removal of all visual tokens after a certain layer.
arXiv Detail & Related papers (2024-11-29T11:24:23Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Bridging the Gap between Different Vocabularies for LLM Ensemble [10.669552498083709]
vocabulary discrepancies among various large language models (LLMs) have constrained previous studies.
We propose a novel method to Ensemble LLMs via Vocabulary Alignment (EVA)
EVA bridges the lexical gap among various LLMs, enabling meticulous ensemble at each generation step.
arXiv Detail & Related papers (2024-04-15T06:28:20Z) - ModaVerse: Efficiently Transforming Modalities with LLMs [25.49713745405194]
We introduce ModaVerse, a Multi-modal Large Language Model capable of comprehending and transforming content across various modalities.
We propose a novel Input/Output (I/O) alignment mechanism that operates directly at the level of natural language.
arXiv Detail & Related papers (2024-01-12T06:28:54Z) - Which Syntactic Capabilities Are Statistically Learned by Masked
Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities.
To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z) - Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning [23.932500424117244]
In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs)
Previous studies have shown that using LLMs' outputs as labels is effective in training models to select demonstrations.
This paper presents an analysis on different utility functions by focusing on LLMs' output probability given ground-truth output.
arXiv Detail & Related papers (2023-11-16T07:03:54Z) - VideoLLM: Modeling Video Sequence with Large Language Models [70.32832021713864]
Existing video understanding models are often task-specific and lack a comprehensive capability of handling diverse tasks.
We propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs.
VideoLLM incorporates a carefully designed Modality and Semantic Translator, which convert inputs from various modalities into a unified token sequence.
arXiv Detail & Related papers (2023-05-22T17:51:22Z) - Contextual Representation Learning beyond Masked Language Modeling [45.46220173487394]
We analyze language models (MLMs) such as BERT learn contextually.
To address these issues, we propose TACO, a representation learning approach directly directly global semantics.
TACO extracts contextual semantics hidden in contextualized representations to encourage models to attend global semantics.
arXiv Detail & Related papers (2022-04-08T16:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.