Zero-shot Commonsense Question Answering with Cloze Translation and
Consistency Optimization
- URL: http://arxiv.org/abs/2201.00136v1
- Date: Sat, 1 Jan 2022 07:12:49 GMT
- Title: Zero-shot Commonsense Question Answering with Cloze Translation and
Consistency Optimization
- Authors: Zi-Yi Dou, Nanyun Peng
- Abstract summary: We investigate four translation methods that can translate natural questions into cloze-style sentences.
We show that our methods are complementary datasets to a knowledge base improved model, and combining them can lead to state-of-the-art zero-shot performance.
- Score: 20.14487209460865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Commonsense question answering (CQA) aims to test if models can answer
questions regarding commonsense knowledge that everyone knows. Prior works that
incorporate external knowledge bases have shown promising results, but
knowledge bases are expensive to construct and are often limited to a fixed set
of relations. In this paper, we instead focus on better utilizing the
\textit{implicit knowledge} stored in pre-trained language models. While
researchers have found that the knowledge embedded in pre-trained language
models can be extracted by having them fill in the blanks of carefully designed
prompts for relation extraction and text classification, it remains unclear if
we can adopt this paradigm in CQA where the inputs and outputs take much more
flexible forms. To this end, we investigate four translation methods that can
translate natural questions into cloze-style sentences to better solicit
commonsense knowledge from language models, including a syntactic-based model,
an unsupervised neural model, and two supervised neural models. In addition, to
combine the different translation methods, we propose to encourage consistency
among model predictions on different translated questions with unlabeled data.
We demonstrate the effectiveness of our methods on three CQA datasets in
zero-shot settings. We show that our methods are complementary to a knowledge
base improved model, and combining them can lead to state-of-the-art zero-shot
performance. Analyses also reveal distinct characteristics of the different
cloze translation methods and provide insights on why combining them can lead
to great improvements.
Related papers
- In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Syntax-informed Question Answering with Heterogeneous Graph Transformer [2.139714421848487]
We present a linguistics-informed question answering approach that extends and fine-tunes a pre-trained neural language model.
We illustrate the approach by the addition of syntactic information in the form of dependency and constituency graphic structures connecting tokens and virtual tokens.
arXiv Detail & Related papers (2022-04-01T07:48:03Z) - elBERto: Self-supervised Commonsense Learning for Question Answering [131.51059870970616]
We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
arXiv Detail & Related papers (2022-03-17T16:23:45Z) - Knowledge-driven Data Construction for Zero-shot Evaluation in
Commonsense Question Answering [80.60605604261416]
We propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks.
We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks.
We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.
arXiv Detail & Related papers (2020-11-07T22:52:21Z) - Unsupervised Commonsense Question Answering with Self-Talk [71.63983121558843]
We propose an unsupervised framework based on self-talk as a novel alternative to commonsense tasks.
Inspired by inquiry-based discovery learning, our approach inquires language models with a number of information seeking questions.
Empirical results demonstrate that the self-talk procedure substantially improves the performance of zero-shot language model baselines.
arXiv Detail & Related papers (2020-04-11T20:43:37Z) - How Much Knowledge Can You Pack Into the Parameters of a Language Model? [44.81324633069311]
It has been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries.
We measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge.
arXiv Detail & Related papers (2020-02-10T18:55:58Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.