MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained
Language Models
- URL: http://arxiv.org/abs/2109.09237v1
- Date: Sun, 19 Sep 2021 22:19:01 GMT
- Title: MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained
Language Models
- Authors: Qianchu Liu, Fangyu Liu, Nigel Collier, Anna Korhonen, Ivan Vuli\'c
- Abstract summary: We propose a fully unsupervised approach to improving word-in-context (WiC) representations in language models.
MirrorWiC learns context-aware word representations within a standard contrastive learning setup.
Our proposed fully unsupervised MirrorWiC models obtain substantial gains over off-the-shelf PLMs across all monolingual, multilingual and cross-lingual setups.
- Score: 61.48034827104998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work indicated that pretrained language models (PLMs) such as BERT and
RoBERTa can be transformed into effective sentence and word encoders even via
simple self-supervised techniques. Inspired by this line of work, in this paper
we propose a fully unsupervised approach to improving word-in-context (WiC)
representations in PLMs, achieved via a simple and efficient WiC-targeted
fine-tuning procedure: MirrorWiC. The proposed method leverages only raw texts
sampled from Wikipedia, assuming no sense-annotated data, and learns
context-aware word representations within a standard contrastive learning
setup. We experiment with a series of standard and comprehensive WiC benchmarks
across multiple languages. Our proposed fully unsupervised MirrorWiC models
obtain substantial gains over off-the-shelf PLMs across all monolingual,
multilingual and cross-lingual setups. Moreover, on some standard WiC
benchmarks, MirrorWiC is even on-par with supervised models fine-tuned with
in-task data and sense labels.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - CLoVe: Encoding Compositional Language in Contrastive Vision-Language
Models [33.80107512462935]
Foundational Vision-Language Models (VLMs) excel at object-centric recognition yet learn text representations that seem invariant to word order.
No evidence exists that any VLM, including large-scale single-stream models such as GPT-4V, identifies compositions successfully.
In this paper, we introduce a framework to significantly improve the ability of existing models to encode compositional language.
arXiv Detail & Related papers (2024-02-22T23:42:25Z) - Language Models are Universal Embedders [48.12992614723464]
We show that pre-trained transformer decoders can embed universally when finetuned on limited English data.
Our models achieve competitive performance on different embedding tasks by minimal training data.
These results provide evidence of a promising path towards building powerful unified embedders.
arXiv Detail & Related papers (2023-10-12T11:25:46Z) - FILM: How can Few-Shot Image Classification Benefit from Pre-Trained
Language Models? [14.582209994281374]
Few-shot learning aims to train models that can be generalized to novel classes with only a few samples.
We propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning.
arXiv Detail & Related papers (2023-07-09T08:07:43Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - Multimodal Knowledge Alignment with Reinforcement Learning [103.68816413817372]
ESPER extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning.
Our key novelty is to use reinforcement learning to align multimodal inputs to language model generations without direct supervision.
Experiments demonstrate that ESPER outperforms baselines and prior work on a variety of zero-shot tasks.
arXiv Detail & Related papers (2022-05-25T10:12:17Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.