Probing Pretrained Language Models for Lexical Semantics
- URL: http://arxiv.org/abs/2010.05731v1
- Date: Mon, 12 Oct 2020 14:24:01 GMT
- Title: Probing Pretrained Language Models for Lexical Semantics
- Authors: Ivan Vuli\'c, Edoardo Maria Ponti, Robert Litschko, Goran Glava\v{s},
Anna Korhonen
- Abstract summary: We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
- Score: 76.73599166020307
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The success of large pretrained language models (LMs) such as BERT and
RoBERTa has sparked interest in probing their representations, in order to
unveil what types of knowledge they implicitly capture. While prior research
focused on morphosyntactic, semantic, and world knowledge, it remains unclear
to which extent LMs also derive lexical type-level knowledge from words in
context. In this work, we present a systematic empirical analysis across six
typologically diverse languages and five different lexical tasks, addressing
the following questions: 1) How do different lexical knowledge extraction
strategies (monolingual versus multilingual source LM, out-of-context versus
in-context encoding, inclusion of special tokens, and layer-wise averaging)
impact performance? How consistent are the observed effects across tasks and
languages? 2) Is lexical knowledge stored in few parameters, or is it scattered
throughout the network? 3) How do these representations fare against
traditional static word vectors in lexical tasks? 4) Does the lexical
information emerging from independently trained monolingual LMs display latent
similarities? Our main results indicate patterns and best practices that hold
universally, but also point to prominent variations across languages and tasks.
Moreover, we validate the claim that lower Transformer layers carry more
type-level lexical knowledge, but also show that this knowledge is distributed
across multiple layers.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer [4.944761231728674]
We present a novel framework called "Lexicon-Syntax Enhanced Multilingual BERT"
We use Multilingual BERT as the base model and employ two techniques to enhance its learning capabilities.
Our experimental results demonstrate this framework can consistently outperform all baselines of zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2024-04-25T14:10:52Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs)
Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages.
We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z) - Representing Affect Information in Word Embeddings [5.378735006566249]
We investigated whether and how the affect meaning of a word is encoded in word embeddings pre-trained in large neural networks.
The embeddings varied in being static or contextualized, and how much affect specific information was prioritized during the pre-training and fine-tuning phase.
arXiv Detail & Related papers (2022-09-21T18:16:33Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained
Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge.
However, studies on LMs' factual representation ability have almost invariably been performed on English.
We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z) - What does it mean to be language-agnostic? Probing multilingual sentence
encoders for typological properties [17.404220737977738]
We propose methods for probing sentence representations from state-of-the-art multilingual encoders.
Our results show interesting differences in encoding linguistic variation associated with different pretraining strategies.
arXiv Detail & Related papers (2020-09-27T15:00:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.