A Survey on Transfer Learning in Natural Language Processing
- URL: http://arxiv.org/abs/2007.04239v1
- Date: Sun, 31 May 2020 21:52:31 GMT
- Title: A Survey on Transfer Learning in Natural Language Processing
- Authors: Zaid Alyafeai, Maged Saeed AlShaibani, Irfan Ahmad
- Abstract summary: The demand for transfer learning is increasing as many large models are emerging.
In this survey, we feature the recent transfer learning advances in the field of NLP.
- Score: 8.396202730857942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models usually require a huge amount of data. However, these
large datasets are not always attainable. This is common in many challenging
NLP tasks. Consider Neural Machine Translation, for instance, where curating
such large datasets may not be possible specially for low resource languages.
Another limitation of deep learning models is the demand for huge computing
resources. These obstacles motivate research to question the possibility of
knowledge transfer using large trained models. The demand for transfer learning
is increasing as many large models are emerging. In this survey, we feature the
recent transfer learning advances in the field of NLP. We also provide a
taxonomy for categorizing different transfer learning approaches from the
literature.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws [51.68385617116854]
Scaling laws describe the relationship between the size of language models and their capabilities.
We focus on factual knowledge represented as domains, such as (USA, capital, Washington D.C.) from a Wikipedia page.
A 7B model can store 14B bits of knowledge, surpassing the English Wikipedia and textbooks combined.
arXiv Detail & Related papers (2024-04-08T11:11:31Z) - Guided Transfer Learning [0.0]
In some applications, guided transfer learning enables the network to learn from a small amount of data.
In other cases, a network with a smaller number of parameters can learn a task which otherwise only a larger network could learn.
Guided transfer learning potentially has many applications when the amount of data, model size, or the availability of computational resources reach their limits.
arXiv Detail & Related papers (2023-03-26T18:21:24Z) - Data Augmentation for Neural NLP [0.0]
Data augmentation is a low-cost approach for tackling data scarcity.
This paper gives an overview of current state-of-the-art data augmentation methods used for natural language processing.
arXiv Detail & Related papers (2023-02-22T14:47:15Z) - A Review of Deep Transfer Learning and Recent Advancements [1.3535770763481905]
Deep transfer learning (DTL) methods are the answer to tackle such limitations.
DTLs handle limited target data concerns as well as drastically reduce the training costs.
arXiv Detail & Related papers (2022-01-19T04:19:36Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Training Deep Networks from Zero to Hero: avoiding pitfalls and going
beyond [59.94347858883343]
This tutorial covers the basic steps as well as more recent options to improve models.
It can be particularly useful in datasets that are not as well-prepared as those in challenges.
arXiv Detail & Related papers (2021-09-06T21:31:42Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z) - Low-Resource Adaptation of Neural NLP Models [0.30458514384586405]
This thesis investigates methods for dealing with low-resource scenarios in information extraction and natural language understanding.
We develop and adapt neural NLP models to explore a number of research questions concerning NLP tasks with minimal or no training data.
arXiv Detail & Related papers (2020-11-09T12:13:55Z) - A Survey on Recent Approaches for Natural Language Processing in
Low-Resource Scenarios [30.391291221959545]
Deep neural networks and huge language models are becoming omnipresent in natural language applications.
As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings.
Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing.
arXiv Detail & Related papers (2020-10-23T11:22:01Z) - What is being transferred in transfer learning? [51.6991244438545]
We show that when training from pre-trained weights, the model stays in the same basin in the loss landscape.
We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
arXiv Detail & Related papers (2020-08-26T17:23:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.