elBERto: Self-supervised Commonsense Learning for Question Answering
- URL: http://arxiv.org/abs/2203.09424v1
- Date: Thu, 17 Mar 2022 16:23:45 GMT
- Title: elBERto: Self-supervised Commonsense Learning for Question Answering
- Authors: Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, and
Lawrence Carin
- Abstract summary: We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
- Score: 131.51059870970616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Commonsense question answering requires reasoning about everyday situations
and causes and effects implicit in context. Typically, existing approaches
first retrieve external evidence and then perform commonsense reasoning using
these evidence. In this paper, we propose a Self-supervised Bidirectional
Encoder Representation Learning of Commonsense (elBERto) framework, which is
compatible with off-the-shelf QA model architectures. The framework comprises
five self-supervised tasks to force the model to fully exploit the additional
training signals from contexts containing rich commonsense. The tasks include a
novel Contrastive Relation Learning task to encourage the model to distinguish
between logically contrastive contexts, a new Jigsaw Puzzle task that requires
the model to infer logical chains in long contexts, and three classic SSL tasks
to maintain pre-trained models language encoding ability. On the representative
WIQA, CosmosQA, and ReClor datasets, elBERto outperforms all other methods,
including those utilizing explicit graph reasoning and external knowledge
retrieval. Moreover, elBERto achieves substantial improvements on
out-of-paragraph and no-effect questions where simple lexical similarity
comparison does not help, indicating that it successfully learns commonsense
and is able to leverage it when given dynamic context.
Related papers
- Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning [19.477062052536887]
We propose the Logical-Semantic Integration Model (LSIM), a supervised framework that bridges semantic and logical coherence.
LSIM comprises three components: reinforcement learning predicts a structured fact-rule chain for each question, a trainable Deep Structured Semantic Model (DSSM) retrieves the most relevant candidate questions and in-answer learning generates the final answer.
Our experiments on a real-world legal dataset QA-validated through both automated metrics and human evaluation-demonstrate that LSIM significantly enhances accuracy and reliability compared to existing methods.
arXiv Detail & Related papers (2025-02-11T19:33:07Z) - LatentQA: Teaching LLMs to Decode Activations Into Natural Language [72.87064562349742]
We introduce LatentQA, the task of answering open-ended questions about model activations in natural language.
We propose Latent Interpretation Tuning (LIT), which finetunes a decoder LLM on a dataset of activations and associated question-answer pairs.
Our decoder also specifies a differentiable loss that we use to control models, such as debiasing models on stereotyped sentences and controlling the sentiment of generations.
arXiv Detail & Related papers (2024-12-11T18:59:33Z) - Multi-hop Commonsense Knowledge Injection Framework for Zero-Shot
Commonsense Question Answering [6.086719709100659]
We propose a novel multi-hop commonsense knowledge injection framework.
Our framework achieves state-of-art performance on five commonsense question answering benchmarks.
arXiv Detail & Related papers (2023-05-10T07:13:47Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Utterance Rewriting with Contrastive Learning in Multi-turn Dialogue [22.103162555263143]
We introduce contrastive learning and multi-task learning to jointly model the problem.
Our proposed model achieves state-of-the-art performance on several public datasets.
arXiv Detail & Related papers (2022-03-22T10:13:27Z) - GreaseLM: Graph REASoning Enhanced Language Models for Question
Answering [159.9645181522436]
GreaseLM is a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations.
We show that GreaseLM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8x larger.
arXiv Detail & Related papers (2022-01-21T19:00:05Z) - Zero-shot Commonsense Question Answering with Cloze Translation and
Consistency Optimization [20.14487209460865]
We investigate four translation methods that can translate natural questions into cloze-style sentences.
We show that our methods are complementary datasets to a knowledge base improved model, and combining them can lead to state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2022-01-01T07:12:49Z) - Question Answering over Knowledge Bases by Leveraging Semantic Parsing
and Neuro-Symbolic Reasoning [73.00049753292316]
We propose a semantic parsing and reasoning-based Neuro-Symbolic Question Answering(NSQA) system.
NSQA achieves state-of-the-art performance on QALD-9 and LC-QuAD 1.0.
arXiv Detail & Related papers (2020-12-03T05:17:55Z) - Knowledge-driven Data Construction for Zero-shot Evaluation in
Commonsense Question Answering [80.60605604261416]
We propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks.
We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks.
We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.
arXiv Detail & Related papers (2020-11-07T22:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.