Related papers: Knowledge Efficient Deep Learning for Natural Language Processing

Knowledge Efficient Deep Learning for Natural Language Processing

URL: http://arxiv.org/abs/2008.12878v1
Date: Fri, 28 Aug 2020 23:32:33 GMT
Title: Knowledge Efficient Deep Learning for Natural Language Processing
Authors: Hai Wang
Abstract summary: This thesis focuses on adapting classical methods to modern deep learning models and algorithms. First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models. Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision.
Score: 2.2701338128113124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning has become the workhorse for a wide range of natural language processing applications. But much of the success of deep learning relies on annotated examples. Annotation is time-consuming and expensive to produce at scale. Here we are interested in methods for reducing the required quantity of annotated data -- by making the learning methods more knowledge efficient so as to make them more applicable in low annotation (low resource) settings. There are various classical approaches to making the models more knowledge efficient such as multi-task learning, transfer learning, weakly supervised and unsupervised learning etc. This thesis focuses on adapting such classical methods to modern deep learning models and algorithms. This thesis describes four works aimed at making machine learning models more knowledge efficient. First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models. In particular, we apply KRDL built on Markov logic networks to denoise weak supervision. Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision. Third, we investigate the knowledge transfer techniques in multilingual setting, where we proposed a method that can improve pre-trained multilingual BERT based on the bilingual dictionary. Fourth, we present an episodic memory network for language modelling, in which we encode the large external knowledge for the pre-trained GPT.

Related papers

Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model. It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z)
The Web Can Be Your Oyster for Improving Large Language Models [98.72358969495835]
Large language models (LLMs) encode a large amount of world knowledge. We consider augmenting LLMs with the large-scale web using search engine. We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
arXiv Detail & Related papers (2023-05-18T14:20:32Z)
A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size. This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z)
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models [58.42146641102329]
We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC) KiC empowers a parametric text-to-text language model with a knowledge-rich external memory. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
arXiv Detail & Related papers (2022-10-28T23:18:43Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z)
Adaptive Activation Network For Low Resource Multilingual Speech Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z)
Knowledge Based Multilingual Language Model [44.70205282863062]
We present a novel framework to pretrain knowledge based multilingual language models (KMLMs) We generate a large amount of code-switched synthetic sentences and reasoning-based multilingual training data using the Wikidata knowledge graphs. Based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to facilitate knowledge learning.
arXiv Detail & Related papers (2021-11-22T02:56:04Z)
Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources. Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages. We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
Exploring Benefits of Transfer Learning in Neural Machine Translation [3.7612918175471393]
We propose several transfer learning approaches to reuse a model pretrained on a high-resource language pair. We show how our techniques address specific problems of low-resource languages and are suitable even in high-resource transfer learning.
arXiv Detail & Related papers (2020-01-06T15:11:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.