Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT
- URL: http://arxiv.org/abs/2101.11043v1
- Date: Tue, 26 Jan 2021 19:21:59 GMT
- Title: Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT
- Authors: Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald
- Abstract summary: We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment is manifested across the embedding spaces of different languages.
- Score: 7.057643880514415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate how Multilingual BERT (mBERT) encodes grammar by examining how
the high-order grammatical feature of morphosyntactic alignment (how different
languages define what counts as a "subject") is manifested across the embedding
spaces of different languages. To understand if and how morphosyntactic
alignment affects contextual embedding spaces, we train classifiers to recover
the subjecthood of mBERT embeddings in transitive sentences (which do not
contain overt information about morphosyntactic alignment) and then evaluate
them zero-shot on intransitive sentences (where subjecthood classification
depends on alignment), within and across languages. We find that the resulting
classifier distributions reflect the morphosyntactic alignment of their
training languages. Our results demonstrate that mBERT representations are
influenced by high-level grammatical features that are not manifested in any
one input sentence, and that this is robust across languages. Further examining
the characteristics that our classifiers rely on, we find that features such as
passive voice, animacy and case strongly correlate with classification
decisions, suggesting that mBERT does not encode subjecthood purely
syntactically, but that subjecthood embedding is continuous and dependent on
semantic and discourse factors, as is proposed in much of the functional
linguistics literature. Together, these results provide insight into how
grammatical features manifest in contextual embedding spaces, at a level of
abstraction not covered by previous work.
Related papers
- Assessing the Role of Lexical Semantics in Cross-lingual Transfer through Controlled Manipulations [15.194196775504613]
We analyze how differences between English and a target language influence the capacity to align the language with an English pretrained representation space.
We show that while properties such as the script or word order only have a limited impact on alignment quality, the degree of lexical matching between the two languages, which we define using a measure of translation entropy, greatly affects it.
arXiv Detail & Related papers (2024-08-14T14:59:20Z) - Breaking Down Word Semantics from Pre-trained Language Models through
Layer-wise Dimension Selection [0.0]
This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers.
The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning.
arXiv Detail & Related papers (2023-10-08T11:07:19Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Cross-lingual Text Classification with Heterogeneous Graph Neural
Network [2.6936806968297913]
Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages.
Recent multilingual pretrained language models (mPLM) achieve impressive results in cross-lingual classification tasks.
We propose a simple yet effective method to incorporate heterogeneous information within and across languages for cross-lingual text classification.
arXiv Detail & Related papers (2021-05-24T12:45:42Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - Linguistic Profiling of a Neural Language Model [1.0552465253379135]
We investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process.
We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks.
arXiv Detail & Related papers (2020-10-05T09:09:01Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.