Knowledge Efficient Deep Learning for Natural Language Processing
- URL: http://arxiv.org/abs/2008.12878v1
- Date: Fri, 28 Aug 2020 23:32:33 GMT
- Title: Knowledge Efficient Deep Learning for Natural Language Processing
- Authors: Hai Wang
- Abstract summary: This thesis focuses on adapting classical methods to modern deep learning models and algorithms.
First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models.
Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision.
- Score: 2.2701338128113124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has become the workhorse for a wide range of natural language
processing applications. But much of the success of deep learning relies on
annotated examples. Annotation is time-consuming and expensive to produce at
scale. Here we are interested in methods for reducing the required quantity of
annotated data -- by making the learning methods more knowledge efficient so as
to make them more applicable in low annotation (low resource) settings. There
are various classical approaches to making the models more knowledge efficient
such as multi-task learning, transfer learning, weakly supervised and
unsupervised learning etc. This thesis focuses on adapting such classical
methods to modern deep learning models and algorithms.
This thesis describes four works aimed at making machine learning models more
knowledge efficient. First, we propose a knowledge rich deep learning model
(KRDL) as a unifying learning framework for incorporating prior knowledge into
deep models. In particular, we apply KRDL built on Markov logic networks to
denoise weak supervision. Second, we apply a KRDL model to assist the machine
reading models to find the correct evidence sentences that can support their
decision. Third, we investigate the knowledge transfer techniques in
multilingual setting, where we proposed a method that can improve pre-trained
multilingual BERT based on the bilingual dictionary. Fourth, we present an
episodic memory network for language modelling, in which we encode the large
external knowledge for the pre-trained GPT.
Related papers
- Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - The Web Can Be Your Oyster for Improving Large Language Models [98.72358969495835]
Large language models (LLMs) encode a large amount of world knowledge.
We consider augmenting LLMs with the large-scale web using search engine.
We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
arXiv Detail & Related papers (2023-05-18T14:20:32Z) - A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size.
This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z) - Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language
Models [58.42146641102329]
We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC)
KiC empowers a parametric text-to-text language model with a knowledge-rich external memory.
As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
arXiv Detail & Related papers (2022-10-28T23:18:43Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - Knowledge Based Multilingual Language Model [44.70205282863062]
We present a novel framework to pretrain knowledge based multilingual language models (KMLMs)
We generate a large amount of code-switched synthetic sentences and reasoning-based multilingual training data using the Wikidata knowledge graphs.
Based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to facilitate knowledge learning.
arXiv Detail & Related papers (2021-11-22T02:56:04Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Exploring Benefits of Transfer Learning in Neural Machine Translation [3.7612918175471393]
We propose several transfer learning approaches to reuse a model pretrained on a high-resource language pair.
We show how our techniques address specific problems of low-resource languages and are suitable even in high-resource transfer learning.
arXiv Detail & Related papers (2020-01-06T15:11:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.