Related papers: APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

URL: http://arxiv.org/abs/2212.09282v2
Date: Mon, 5 Jun 2023 00:56:11 GMT
Title: APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning
Authors: Soumya Sanyal, Yichong Xu, Shuohang Wang, Ziyi Yang, Reid Pryzant, Wenhao Yu, Chenguang Zhu, Xiang Ren
Abstract summary: We propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
Score: 73.3035118224719
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA. The code base has been made publicly available.

Related papers

Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings. We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures. Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension [9.67774998354062]
Previous research has primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. We propose a Premise-Oriented Data Augmentation (PODA) framework to generate CoT rationales including analyses for both correct and incorrect options. We also introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples.
arXiv Detail & Related papers (2024-09-22T15:44:43Z)
How Truncating Weights Improves Reasoning in Language Models [49.80959223722325]
We study how certain global associations tend to be stored in specific weight components or Transformer blocks. We analyze how this arises during training, both empirically and theoretically.
arXiv Detail & Related papers (2024-06-05T08:51:08Z)
Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z)
Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic. To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv Detail & Related papers (2023-10-09T06:54:02Z)
Planning with Logical Graph-based Language Model for Instruction Generation [9.70880913062245]
We propose a graph-based language model, Logical-GLM, to infuse logic into language models. We generate logical skeletons to guide language model training, infusing domain knowledge into language models. Our approach can generate instructional texts with more correct logic owing to the internalized domain knowledge.
arXiv Detail & Related papers (2023-08-26T06:28:14Z)
Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss [0.0]
The current trend in developing machine learning models for reading comprehension and logical reasoning tasks is focused on improving the models' abilities to understand and utilize logical rules. This work focuses on providing a novel loss function and accompanying model architecture that has more interpretable components than some other models. Our strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer with incomplete knowledge.
arXiv Detail & Related papers (2023-04-03T14:48:34Z)
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning. We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z)
ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability. ALERT provides a test bed to asses any language model on fine-grained reasoning skills. We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z)
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text [65.24325614642223]
We propose to understand logical symbols and expressions in the text to arrive at the answer. Based on such logical information, we put forward a context extension framework and a data augmentation algorithm. Our method achieves the state-of-the-art performance, and both logic-driven context extension framework and data augmentation algorithm can help improve the accuracy.
arXiv Detail & Related papers (2021-05-08T10:09:36Z)
Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning. We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.