How transfer learning impacts linguistic knowledge in deep NLP models?
- URL: http://arxiv.org/abs/2105.15179v1
- Date: Mon, 31 May 2021 17:43:57 GMT
- Title: How transfer learning impacts linguistic knowledge in deep NLP models?
- Authors: Nadir Durrani and Hassan Sajjad and Fahim Dalvi
- Abstract summary: Deep NLP models learn non-trivial amount of linguistic knowledge, captured at different layers of the model.
We investigate how fine-tuning towards downstream NLP tasks impacts the learned linguistic knowledge.
- Score: 22.035813865470956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning from pre-trained neural language models towards downstream
tasks has been a predominant theme in NLP recently. Several researchers have
shown that deep NLP models learn non-trivial amount of linguistic knowledge,
captured at different layers of the model. We investigate how fine-tuning
towards downstream NLP tasks impacts the learned linguistic knowledge. We carry
out a study across popular pre-trained models BERT, RoBERTa and XLNet using
layer and neuron-level diagnostic classifiers. We found that for some GLUE
tasks, the network relies on the core linguistic information and preserve it
deeper in the network, while for others it forgets. Linguistic information is
distributed in the pre-trained language models but becomes localized to the
lower layers post fine-tuning, reserving higher layers for the task specific
knowledge. The pattern varies across architectures, with BERT retaining
linguistic information relatively deeper in the network compared to RoBERTa and
XLNet, where it is predominantly delegated to the lower layers.
Related papers
- Gradient Localization Improves Lifelong Pretraining of Language Models [32.29298047707914]
Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters.
In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs.
arXiv Detail & Related papers (2024-11-07T05:43:50Z) - Exploring transfer learning for Deep NLP systems on rarely annotated languages [0.0]
This thesis investigates the application of transfer learning for Part-of-Speech (POS) tagging between Hindi and Nepali.
We assess whether multitask learning in Hindi, with auxiliary tasks such as gender and singular/plural tagging, can contribute to improved POS tagging accuracy.
arXiv Detail & Related papers (2024-10-15T13:33:54Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Improving Interpretability via Explicit Word Interaction Graph Layer [28.28660926203816]
We propose a trainable neural network layer that learns a global interaction graph between words and then selects more informative words.
Our layer, we call WIGRAPH, can plug into any neural network-based NLP text classifiers right after its word embedding layer.
arXiv Detail & Related papers (2023-02-03T21:56:32Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Discovering Salient Neurons in Deep NLP Models [31.18937787704794]
We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model.
Our data-driven, quantitative analysis illuminates interesting findings.
Our code is publicly available as part of the NeuroX toolkit.
arXiv Detail & Related papers (2022-06-27T13:31:49Z) - Initial Study into Application of Feature Density and
Linguistically-backed Embedding to Improve Machine Learning-based
Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection.
The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Analyzing Individual Neurons in Pre-trained Language Models [41.07850306314594]
We find small subsets of neurons to predict linguistic tasks, with lower level tasks localized in fewer neurons, compared to higher level task of predicting syntax.
For example, we found neurons in XLNet to be more localized and disjoint when predicting properties compared to BERT and others, where they are more distributed and coupled.
arXiv Detail & Related papers (2020-10-06T13:17:38Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.
Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.