Unsupervised Pre-training with Structured Knowledge for Improving
Natural Language Inference
- URL: http://arxiv.org/abs/2109.03941v1
- Date: Wed, 8 Sep 2021 21:28:12 GMT
- Title: Unsupervised Pre-training with Structured Knowledge for Improving
Natural Language Inference
- Authors: Xiaoyu Yang, Xiaodan Zhu, Zhan Shi, Tianda Li
- Abstract summary: We propose models that leverage structured knowledge in different components of pre-trained models.
Our results show that the proposed models perform better than previous BERT-based state-of-the-art models.
- Score: 22.648536283569747
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While recent research on natural language inference has considerably
benefited from large annotated datasets, the amount of inference-related
knowledge (including commonsense) provided in the annotated data is still
rather limited. There have been two lines of approaches that can be used to
further address the limitation: (1) unsupervised pretraining can leverage
knowledge in much larger unstructured text data; (2) structured (often
human-curated) knowledge has started to be considered in neural-network-based
models for NLI. An immediate question is whether these two approaches
complement each other, or how to develop models that can bring together their
advantages. In this paper, we propose models that leverage structured knowledge
in different components of pre-trained models. Our results show that the
proposed models perform better than previous BERT-based state-of-the-art
models. Although our models are proposed for NLI, they can be easily extended
to other sentence or sentence-pair classification problems.
Related papers
- Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - A Survey of Knowledge Enhanced Pre-trained Models [28.160826399552462]
We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs)
These models demonstrate deep understanding and logical reasoning and introduce interpretability.
arXiv Detail & Related papers (2021-10-01T08:51:58Z) - The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
Architectures [0.0]
Natural Language Processing models have achieved phenomenal success in linguistic and semantic tasks.
Recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes.
Knowledge Retrievers have been built to extricate explicit data documents from a large corpus of databases with greater efficiency and accuracy.
arXiv Detail & Related papers (2021-03-23T22:38:20Z) - Self-Explaining Structures Improve NLP Models [25.292847674586614]
We propose a simple yet general and effective self-explaining framework for deep learning models in NLP.
We show that interpretability does not come at the cost of performance: a neural model of self-explaining features obtains better performances than its counterpart without the self-explaining nature.
arXiv Detail & Related papers (2020-12-03T09:32:05Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - On the comparability of Pre-trained Language Models [0.0]
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP.
More elaborated architectures are making better use of contextual information.
Larger corpora are used as resources for pre-training large language models in a self-supervised fashion.
Advances in parallel computing as well as in cloud computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models.
arXiv Detail & Related papers (2020-01-03T10:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.