Symbolic Knowledge Distillation: from General Language Models to
Commonsense Models
- URL: http://arxiv.org/abs/2110.07178v1
- Date: Thu, 14 Oct 2021 06:50:19 GMT
- Title: Symbolic Knowledge Distillation: from General Language Models to
Commonsense Models
- Authors: Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei
Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Yejin Choi
- Abstract summary: General language models author knowledge graphs to train commonsense models.
We distill knowledge symbolically-as text-in addition to the neural model.
For the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant.
- Score: 38.29726383331247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The common practice for training commonsense models has gone
from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in
order to train commonsense models. In this work, we investigate an alternative,
from-machine-to-corpus-to-machine: general language models author these
commonsense knowledge graphs to train commonsense models. Our study leads to a
new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge
Distillation (Hinton et al., 2015), our approach uses larger models to teach
smaller models. A key difference is that we distill knowledge symbolically-as
text-in addition to the neural model. We also distill only one aspect-the
commonsense of a general language model teacher, allowing the student to be a
different type, a commonsense model. Altogether, we show that careful prompt
engineering and a separately trained critic model allow us to selectively
distill high-quality causal commonsense from GPT-3, a general language model.
Empirical results demonstrate that, for the first time, a human-authored
commonsense knowledge graph is surpassed by our automatically distilled variant
in all three criteria: quantity, quality, and diversity. In addition, it
results in a neural commonsense model that surpasses the teacher model's
commonsense capabilities despite its 100x smaller size. We apply this to the
ATOMIC resource, and share our new symbolic knowledge graph and commonsense
models.
Related papers
- NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge
Distillation [82.85412355714898]
We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models.
Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks.
It explicitly centers knowledge, enabling superior performance for commonsense reasoning.
arXiv Detail & Related papers (2023-12-10T19:45:24Z) - PHALM: Building a Knowledge Graph from Scratch by Prompting Humans and a
Language Model [15.148567298728574]
We propose PHALM, a method of building a knowledge graph from scratch.
We used this method to build a Japanese event knowledge graph and trained Japanese commonsense generation models.
arXiv Detail & Related papers (2023-10-11T03:39:46Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Improving Neural Topic Models with Wasserstein Knowledge Distillation [0.8962460460173959]
We propose a knowledge distillation framework to compress a contextualized topic model without loss in topic quality.
Experiments show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model.
arXiv Detail & Related papers (2023-03-27T16:07:44Z) - I2D2: Inductive Knowledge Distillation with NeuroLogic and
Self-Imitation [89.38161262164586]
We study generative models of commonsense knowledge, focusing on the task of generating generics.
We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al.
Our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.
arXiv Detail & Related papers (2022-12-19T04:47:49Z) - Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format.
This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks.
Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z) - A Metamodel and Framework for Artificial General Intelligence From
Theory to Practice [11.756425327193426]
This paper introduces a new metamodel-based knowledge representation that significantly improves autonomous learning and adaptation.
We have applied the metamodel to problems ranging from time series analysis, computer vision, and natural language understanding.
One surprising consequence of the metamodel is that it not only enables a new level of autonomous learning and optimal functioning for machine intelligences.
arXiv Detail & Related papers (2021-02-11T16:45:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.