CoCoLM: COmplex COmmonsense Enhanced Language Model
- URL: http://arxiv.org/abs/2012.15643v1
- Date: Thu, 31 Dec 2020 15:05:36 GMT
- Title: CoCoLM: COmplex COmmonsense Enhanced Language Model
- Authors: Changlong Yu, Hongming Zhang, Yangqiu Song and Wilfred Ng
- Abstract summary: We propose to help pre-trained language models better incorporate complex commonsense knowledge.
Different from existing fine-tuning approaches, we do not focus on a specific task and propose a general language model named CoCoLM.
We successfully teach pre-trained language models rich complex commonsense knowledge among eventualities.
- Score: 45.396629052897524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale pre-trained language models have demonstrated strong knowledge
representation ability. However, recent studies suggest that even though these
giant models contains rich simple commonsense knowledge (e.g., bird can fly and
fish can swim.), they often struggle with the complex commonsense knowledge
that involves multiple eventualities (verb-centric phrases, e.g., identifying
the relationship between ``Jim yells at Bob'' and ``Bob is upset'').To address
this problem, in this paper, we propose to help pre-trained language models
better incorporate complex commonsense knowledge. Different from existing
fine-tuning approaches, we do not focus on a specific task and propose a
general language model named CoCoLM. Through the careful training over a
large-scale eventuality knowledge graphs ASER, we successfully teach
pre-trained language models (i.e., BERT and RoBERTa) rich complex commonsense
knowledge among eventualities. Experiments on multiple downstream commonsense
tasks that requires the correct understanding of eventualities demonstrate the
effectiveness of CoCoLM.
Related papers
- Co-occurrence is not Factual Association in Language Models [19.708303468664088]
We show that language models are biased to learn word co-occurrence statistics instead of true factual associations.
We propose two strategies to improve the learning of factual associations in language models.
arXiv Detail & Related papers (2024-09-21T08:13:16Z) - Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs [55.317267269115845]
Chain-of-Knowledge (CoK) is a comprehensive framework for knowledge reasoning.
CoK includes methodologies for both dataset construction and model learning.
We conduct extensive experiments with KnowReason.
arXiv Detail & Related papers (2024-06-30T10:49:32Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability.
ALERT provides a test bed to asses any language model on fine-grained reasoning skills.
We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z) - Incorporating Commonsense Knowledge Graph in Pretrained Models for
Social Commonsense Tasks [6.335245542129822]
External commonsense knowledge graphs (KGs) provide rich information about words and their relationships.
We propose two approaches to emphimplicitly and emphexplicitly infuse such KGs into pretrained language models.
We demonstrate our proposed methods perform well on SocialIQA, a social commonsense reasoning task, in both limited and full training data.
arXiv Detail & Related papers (2021-05-12T06:45:26Z) - Probing Across Time: What Does RoBERTa Know and When? [70.20775905353794]
We show that linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive.
We believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.
arXiv Detail & Related papers (2021-04-16T04:26:39Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z) - Teaching Pretrained Models with Commonsense Reasoning: A Preliminary
KB-Based Approach [24.954288132238293]
We propose a method to teach pretrained models with commonsense reasoning by leveraging the structured knowledge in ConceptNet.
Experimental results demonstrate that, when refined on these training examples, the pretrained models consistently improve their performance on tasks that require commonsense reasoning.
arXiv Detail & Related papers (2019-09-20T23:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.