Post-hoc analysis of Arabic transformer models
- URL: http://arxiv.org/abs/2210.09990v1
- Date: Tue, 18 Oct 2022 16:53:51 GMT
- Title: Post-hoc analysis of Arabic transformer models
- Authors: Ahmed Abdelali and Nadir Durrani and Fahim Dalvi and Hassan Sajjad
- Abstract summary: We probe how linguistic information is encoded in the transformer models, trained on different Arabic dialects.
We perform a layer and neuron analysis on the models using morphological tagging tasks for different dialects of Arabic and a dialectal identification task.
- Score: 20.741730718486032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Arabic is a Semitic language which is widely spoken with many dialects. Given
the success of pre-trained language models, many transformer models trained on
Arabic and its dialects have surfaced. While there have been an extrinsic
evaluation of these models with respect to downstream NLP tasks, no work has
been carried out to analyze and compare their internal representations. We
probe how linguistic information is encoded in the transformer models, trained
on different Arabic dialects. We perform a layer and neuron analysis on the
models using morphological tagging tasks for different dialects of Arabic and a
dialectal identification task. Our analysis enlightens interesting findings
such as: i) word morphology is learned at the lower and middle layers, ii)
while syntactic dependencies are predominantly captured at the higher layers,
iii) despite a large overlap in their vocabulary, the MSA-based models fail to
capture the nuances of Arabic dialects, iv) we found that neurons in embedding
layers are polysemous in nature, while the neurons in middle layers are
exclusive to specific properties
Related papers
- AlcLaM: Arabic Dialectal Language Model [2.8477895544986955]
We construct an Arabic dialectal corpus comprising 3.4M sentences gathered from social media platforms.
We utilize this corpus to expand the vocabulary and retrain a BERT-based model from scratch.
Named AlcLaM, our model was trained using only 13 GB of text, which represents a fraction of the data used by existing models.
arXiv Detail & Related papers (2024-07-18T02:13:50Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - Parameter and Data Efficient Continual Pre-training for Robustness to
Dialectal Variance in Arabic [9.004920233490642]
We show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model.
We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function.
arXiv Detail & Related papers (2022-11-08T02:51:57Z) - Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
Models [28.036233760742125]
We causally probe multilingual language models (XGLM and multilingual BERT) across various languages.
We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models.
arXiv Detail & Related papers (2022-10-25T20:43:36Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Interpreting Arabic Transformer Models [18.98681439078424]
We probe how linguistic information is encoded in Arabic pretrained models, trained on different varieties of Arabic language.
We perform a layer and neuron analysis on the models using three intrinsic tasks: two morphological tagging tasks based on MSA (modern standard Arabic) and dialectal POS-tagging and a dialectal identification task.
arXiv Detail & Related papers (2022-01-19T06:32:25Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.