Critical Thinking for Language Models
- URL: http://arxiv.org/abs/2009.07185v2
- Date: Thu, 17 Dec 2020 14:42:42 GMT
- Title: Critical Thinking for Language Models
- Authors: Gregor Betz and Christian Voigt and Kyle Richardson
- Abstract summary: This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models.
We generate artificial argumentative texts to train and evaluate GPT-2.
We obtain consistent and promising results for NLU benchmarks.
- Score: 6.963299759354333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper takes a first step towards a critical thinking curriculum for
neural auto-regressive language models. We introduce a synthetic corpus of
deductively valid arguments, and generate artificial argumentative texts to
train and evaluate GPT-2. Significant transfer learning effects can be
observed: Training a model on three simple core schemes allows it to accurately
complete conclusions of different, and more complex types of arguments, too.
The language models generalize the core argument schemes in a correct way.
Moreover, we obtain consistent and promising results for NLU benchmarks. In
particular, pre-training on the argument schemes raises zero-shot accuracy on
the GLUE diagnostics by up to 15 percentage points. The findings suggest that
intermediary pre-training on texts that exemplify basic reasoning abilities
(such as typically covered in critical thinking textbooks) might help language
models to acquire a broad range of reasoning skills. The synthetic
argumentative texts presented in this paper are a promising starting point for
building such a "critical thinking curriculum for language models."
Related papers
- Reasoning Elicitation in Language Models via Counterfactual Feedback [17.908819732623716]
We derive novel metrics that balance accuracy in factual and counterfactual questions.
We propose several fine-tuning approaches that aim to elicit better reasoning mechanisms.
We evaluate the performance of the fine-tuned language models in a variety of realistic scenarios.
arXiv Detail & Related papers (2024-10-02T15:33:30Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - Investigating the Efficacy of Large Language Models in Reflective
Assessment Methods through Chain of Thoughts Prompting [0.2552922646705803]
Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs' proficiency in complex reasoning tasks.
The primary aim of this research is to assess how well four language models can grade reflective essays of third-year medical students.
arXiv Detail & Related papers (2023-09-30T06:25:27Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z) - ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability.
ALERT provides a test bed to asses any language model on fine-grained reasoning skills.
We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - AMPERSAND: Argument Mining for PERSuAsive oNline Discussions [41.06165177604387]
We propose a computational model for argument mining in online persuasive discussion forums.
Our approach relies on identifying relations between components of arguments in a discussion thread.
Our models obtain significant improvements compared to recent state-of-the-art approaches.
arXiv Detail & Related papers (2020-04-30T10:33:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.