Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering
- URL: http://arxiv.org/abs/2004.10157v2
- Date: Mon, 25 May 2020 17:53:40 GMT
- Title: Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering
- Authors: Akari Asai, Hannaneh Hajishirzi
- Abstract summary: This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
- Score: 55.05667583529711
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many natural language questions require qualitative, quantitative or logical
comparisons between two entities or events. This paper addresses the problem of
improving the accuracy and consistency of responses to comparison questions by
integrating logic rules and neural models. Our method leverages logical and
linguistic knowledge to augment labeled training data and then uses a
consistency-based regularizer to train the model. Improving the global
consistency of predictions, our approach achieves large improvements over
previous methods in a variety of question answering (QA) tasks including
multiple-choice qualitative reasoning, cause-effect reasoning, and extractive
machine reading comprehension. In particular, our method significantly improves
the performance of RoBERTa-based models by 1-5% across datasets. We advance the
state of the art by around 5-8% on WIQA and QuaRel and reduce consistency
violations by 58% on HotpotQA. We further demonstrate that our approach can
learn effectively from limited data.
Related papers
- UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models [4.627548680442906]
OwnThink is the most extensive Chinese open-domain knowledge graph introduced in recent times.
We introduce UniOQA, a unified framework that integrates two parallel approaches to question answering.
UniOQA notably advances SpCQL Logical Accuracy to 21.2% and Execution Accuracy to 54.9%, achieving the new state-of-the-art results on this benchmark.
arXiv Detail & Related papers (2024-06-04T08:36:39Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - Enhancing Textbook Question Answering Task with Large Language Models
and Retrieval Augmented Generation [3.948068081583197]
This paper proposes a methodology that handle the out-of-domain scenario in Textbook question answering (TQA)
Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
arXiv Detail & Related papers (2024-02-05T11:58:56Z) - Information Association for Language Model Updating by Mitigating
LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data.
Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information.
We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z) - A quantitative study of NLP approaches to question difficulty estimation [0.30458514384586394]
This work quantitatively analyzes several approaches proposed in previous research, and comparing their performance on datasets from different educational domains.
We find that Transformer based models are the best performing across different educational domains, with DistilBERT performing almost as well as BERT.
As for the other models, the hybrid ones often outperform the ones based on a single type of features, the ones based on linguistic features perform well on reading comprehension questions, while frequency based features (TF-IDF) and word embeddings (word2vec) perform better in domain knowledge assessment.
arXiv Detail & Related papers (2023-05-17T14:26:00Z) - T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large
Language Model Signals for Science Question Answering [59.63860993280275]
Large Language Models (LLMs) have demonstrated exceptional performance in various Natural Language Processing (NLP) tasks.
We propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals.
Our approach achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%.
arXiv Detail & Related papers (2023-05-05T11:56:30Z) - Zero-shot Commonsense Question Answering with Cloze Translation and
Consistency Optimization [20.14487209460865]
We investigate four translation methods that can translate natural questions into cloze-style sentences.
We show that our methods are complementary datasets to a knowledge base improved model, and combining them can lead to state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2022-01-01T07:12:49Z) - Learning to Perturb Word Embeddings for Out-of-distribution QA [55.103586220757464]
We propose a simple yet effective DA method based on a noise generator, which learns to perturb the word embedding of the input questions and context without changing their semantics.
We validate the performance of the QA models trained with our word embedding on a single source dataset, on five different target domains.
Notably, the model trained with ours outperforms the model trained with more than 240K artificially generated QA pairs.
arXiv Detail & Related papers (2021-05-06T14:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.