APOLLO: A Simple Approach for Adaptive Pretraining of Language Models
for Logical Reasoning
- URL: http://arxiv.org/abs/2212.09282v2
- Date: Mon, 5 Jun 2023 00:56:11 GMT
- Title: APOLLO: A Simple Approach for Adaptive Pretraining of Language Models
for Logical Reasoning
- Authors: Soumya Sanyal, Yichong Xu, Shuohang Wang, Ziyi Yang, Reid Pryzant,
Wenhao Yu, Chenguang Zhu, Xiang Ren
- Abstract summary: We propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities.
APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
- Score: 73.3035118224719
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Logical reasoning of text is an important ability that requires understanding
the information present in the text, their interconnections, and then reasoning
through them to infer new conclusions. Prior works on improving the logical
reasoning ability of language models require complex processing of training
data (e.g., aligning symbolic knowledge to text), yielding task-specific data
augmentation solutions that restrict the learning of general logical reasoning
skills. In this work, we propose APOLLO, an adaptively pretrained language
model that has improved logical reasoning abilities. We select a subset of
Wikipedia, based on a set of logical inference keywords, for continued
pretraining of a language model. We use two self-supervised loss functions: a
modified masked language modeling loss where only specific parts-of-speech
words, that would likely require more reasoning than basic language
understanding, are masked, and a sentence-level classification loss that
teaches the model to distinguish between entailment and contradiction types of
sentences. The proposed training paradigm is both simple and independent of
task formats. We demonstrate the effectiveness of APOLLO by comparing it with
prior baselines on two logical reasoning datasets. APOLLO performs comparably
on ReClor and outperforms baselines on LogiQA. The code base has been made
publicly available.
Related papers
- Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension [9.67774998354062]
Previous research has primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation.
We propose a Premise-Oriented Data Augmentation (PODA) framework to generate CoT rationales including analyses for both correct and incorrect options.
We also introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples.
arXiv Detail & Related papers (2024-09-22T15:44:43Z) - How Truncating Weights Improves Reasoning in Language Models [49.80959223722325]
We study how certain global associations tend to be stored in specific weight components or Transformer blocks.
We analyze how this arises during training, both empirically and theoretically.
arXiv Detail & Related papers (2024-06-05T08:51:08Z) - Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers.
LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z) - Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic.
To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv Detail & Related papers (2023-10-09T06:54:02Z) - Planning with Logical Graph-based Language Model for Instruction Generation [9.70880913062245]
We propose a graph-based language model, Logical-GLM, to infuse logic into language models.
We generate logical skeletons to guide language model training, infusing domain knowledge into language models.
Our approach can generate instructional texts with more correct logic owing to the internalized domain knowledge.
arXiv Detail & Related papers (2023-08-26T06:28:14Z) - Deep Manifold Learning for Reading Comprehension and Logical Reasoning
Tasks with Polytuplet Loss [0.0]
The current trend in developing machine learning models for reading comprehension and logical reasoning tasks is focused on improving the models' abilities to understand and utilize logical rules.
This work focuses on providing a novel loss function and accompanying model architecture that has more interpretable components than some other models.
Our strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer with incomplete knowledge.
arXiv Detail & Related papers (2023-04-03T14:48:34Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability.
ALERT provides a test bed to asses any language model on fine-grained reasoning skills.
We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z) - Logic-Driven Context Extension and Data Augmentation for Logical
Reasoning of Text [65.24325614642223]
We propose to understand logical symbols and expressions in the text to arrive at the answer.
Based on such logical information, we put forward a context extension framework and a data augmentation algorithm.
Our method achieves the state-of-the-art performance, and both logic-driven context extension framework and data augmentation algorithm can help improve the accuracy.
arXiv Detail & Related papers (2021-05-08T10:09:36Z) - Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.