Related papers: K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining

K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining

URL: http://arxiv.org/abs/2104.10649v2
Date: Sat, 29 May 2021 10:08:32 GMT
Title: K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining
Authors: Ruiqing Yan, Lanchang Sun, Fang Wang, Xiaoming Zhang
Abstract summary: We focus on improving model pretraining by leveraging explicit knowledge. To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly. The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.
Score: 5.178964604577459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Though pre-trained language models such as Bert and XLNet, have rapidly advanced the state-of-the-art on many NLP tasks, they implicit semantics only relying on surface information between words in corpus. Intuitively, background knowledge influences the efficacy of understanding. Inspired by this common sense, we focus on improving model pretraining by leveraging explicit knowledge. Different from recent research that optimize pretraining model by knowledge masking strategies, we propose a simple but general method to combine explicit knowledge with pretraining. To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly without changing its architecture. The present study seeks to find the direct impact of explicit knowledge on transformer per-training. We conduct experiments on various datasets for different downstream tasks. The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.

Related papers

Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning [70.64617500380287]
Continual learning allows models to learn from new data while retaining previously learned knowledge. The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes. We propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings.
arXiv Detail & Related papers (2024-08-02T07:51:44Z)
TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models [31.209774088374374]
This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models. We employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information. We show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.
arXiv Detail & Related papers (2024-03-17T13:04:35Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z)
Ontology-enhanced Prompt-tuning for Few-shot Learning [41.51144427728086]
Few-shot Learning is aimed to make predictions based on a limited number of samples. Structured data such as knowledge graphs and ontology libraries has been leveraged to benefit the few-shot setting in various tasks.
arXiv Detail & Related papers (2022-01-27T05:41:36Z)
Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer. We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks. Experiments on text classification show promising results. We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)
CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE) CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z)
Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture. We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing. Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z)
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus. Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.