Pre-training Text-to-Text Transformers for Concept-centric Common Sense
- URL: http://arxiv.org/abs/2011.07956v2
- Date: Wed, 25 Nov 2020 04:53:38 GMT
- Title: Pre-training Text-to-Text Transformers for Concept-centric Common Sense
- Authors: Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill
Yuchen Lin, Xiang Ren
- Abstract summary: We propose a concept-aware language model (CALM) to augment pre-trained language models with concept-centric commonsense knowledge.
We show that CALM can pack more commonsense knowledge into the parameters of a pre-trained text-to-text transformer without relying on external knowledge graphs.
- Score: 48.11844351407072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PTLM) have achieved impressive results in a
range of natural language understanding (NLU) and generation (NLG) tasks.
However, current pre-training objectives such as masked token prediction (for
BERT-style PTLMs) and masked span infilling (for T5-style PTLMs) do not
explicitly model the relational commonsense knowledge about everyday concepts,
which is crucial to many downstream tasks that need common sense to understand
or generate. To augment PTLMs with concept-centric commonsense knowledge, in
this paper, we propose both generative and contrastive objectives for learning
common sense from the text, and use them as intermediate self-supervised
learning tasks for incrementally pre-training PTLMs (before task-specific
fine-tuning on downstream datasets). Furthermore, we develop a joint
pre-training framework to unify generative and contrastive objectives so that
they can mutually reinforce each other. Extensive experimental results show
that our method, concept-aware language model (CALM), can pack more commonsense
knowledge into the parameters of a pre-trained text-to-text transformer without
relying on external knowledge graphs, yielding better performance on both NLU
and NLG tasks. We show that while only incrementally pre-trained on a
relatively small corpus for a few steps, CALM outperforms baseline methods by a
consistent margin and even comparable with some larger PTLMs, which suggests
that CALM can serve as a general, plug-and-play method for improving the
commonsense reasoning ability of a PTLM.
Related papers
- Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation [31.733890798723085]
Large Language Models (LLMs) have achieved impressive results across numerous NLP tasks but still encounter difficulties in machine translation.
We propose a novel approach called RaDis (Rationale Distillation) to overcome this issue.
RaDis harnesses the strong generative capabilities of LLMs to create rationales for training data, which are then "replayed" to prevent forgetting.
arXiv Detail & Related papers (2024-10-17T18:09:43Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - A Primer on Contrastive Pretraining in Language Processing: Methods,
Lessons Learned and Perspectives [22.933794444266596]
We describe recent self-supervised and supervised contrastive NLP pretraining methods.
We introduce key contrastive learning concepts with lessons learned from prior research and structure works by applications.
We point to open challenges and future directions for contrastive NLP to encourage bringing contrastive NLP pretraining closer to recent successes in image representation pretraining.
arXiv Detail & Related papers (2021-02-25T16:35:07Z) - Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation [79.0866650271659]
Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives.
We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives.
This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
arXiv Detail & Related papers (2020-09-10T16:46:46Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.