NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge
Distillation
- URL: http://arxiv.org/abs/2312.05979v1
- Date: Sun, 10 Dec 2023 19:45:24 GMT
- Title: NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge
Distillation
- Authors: Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei
Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra
Bhagavatula, Yejin Choi
- Abstract summary: We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models.
Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks.
It explicitly centers knowledge, enabling superior performance for commonsense reasoning.
- Score: 82.85412355714898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present NovaCOMET, an open commonsense knowledge model, that combines the
best aspects of knowledge and general task models. Compared to previous
knowledge models, NovaCOMET allows open-format relations enabling direct
application to reasoning tasks; compared to general task models like Flan-T5,
it explicitly centers knowledge, enabling superior performance for commonsense
reasoning.
NovaCOMET leverages the knowledge of opaque proprietary models to create an
open knowledge pipeline. First, knowledge is symbolically distilled into
NovATOMIC, a publicly-released discrete knowledge graph which can be audited,
critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning
an open-source pretrained model. NovaCOMET uses an open-format training
objective, replacing the fixed relation sets of past knowledge models, enabling
arbitrary structures within the data to serve as inputs or outputs.
The resulting generation model, optionally augmented with human annotation,
matches or exceeds comparable open task models like Flan-T5 on a range of
commonsense generation tasks. NovaCOMET serves as a counterexample to the
contemporary focus on instruction tuning only, demonstrating a distinct
advantage to explicitly modeling commonsense knowledge as well.
Related papers
- On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models [7.062887337934677]
We propose that small models may not need to absorb the cost of pre-training to reap its benefits.
We observe that, when distilled on a task from a pre-trained model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task.
arXiv Detail & Related papers (2024-04-04T07:38:11Z) - Class-relation Knowledge Distillation for Novel Class Discovery [16.461242381109276]
Key challenge lies in transferring the knowledge in the known-class data to the learning of novel classes.
We introduce a class relation representation for the novel classes based on the predicted class distribution of a model trained on known classes.
We propose a novel knowledge distillation framework, which utilizes our class-relation representation to regularize the learning of novel classes.
arXiv Detail & Related papers (2023-07-18T11:35:57Z) - Plug-and-Play Knowledge Injection for Pre-trained Language Models [116.37916535076478]
Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks.
Massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks.
We study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models.
arXiv Detail & Related papers (2023-05-28T10:58:00Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z) - Generative Adversarial Zero-Shot Relational Learning for Knowledge
Graphs [96.73259297063619]
We consider a novel formulation, zero-shot learning, to free this cumbersome curation.
For newly-added relations, we attempt to learn their semantic features from their text descriptions.
We leverage Generative Adrial Networks (GANs) to establish the connection between text and knowledge graph domain.
arXiv Detail & Related papers (2020-01-08T01:19:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.