Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism
- URL: http://arxiv.org/abs/2406.18762v1
- Date: Wed, 26 Jun 2024 21:17:20 GMT
- Title: Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism
- Authors: Shi Zong, Jimmy Lin,
- Abstract summary: This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms.
We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective.
We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
- Score: 62.571419297164645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There have been a huge number of benchmarks proposed to evaluate how large language models (LLMs) behave for logic inference tasks. However, it remains an open question how to properly evaluate this ability. In this paper, we provide a systematic overview of prior works on the logical reasoning ability of LLMs for analyzing categorical syllogisms. We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective and then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets. Our results indicate that compared to template-based synthetic datasets, crowdsourcing approaches normally sacrifice the coverage of configurations (i.e., mood and figure) of categorical syllogisms for more language variations, thus bringing challenges to fully testing LLMs under different situations. We then proceed to summarize the findings and observations for the performances of LLMs to infer the validity of syllogisms from the current literature. The error rate breakdown analyses suggest that the interpretation of the quantifiers seems to be the current bottleneck that limits the performances of the LLMs and is thus worth more attention. Finally, we discuss several points that might be worth considering when researchers plan on the future release of categorical syllogism datasets. We hope our work will not only provide a timely review of the current literature regarding categorical syllogisms, but also motivate more interdisciplinary research between communities, specifically computational linguists and logicians.
Related papers
- A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences [5.141416267381492]
We consider the case of syllogistic reasoning, an area of deductive reasoning studied extensively in logic and cognitive psychology.
We investigate the effects of chain-of-thought reasoning, in-context learning, and supervised fine-tuning on syllogistic reasoning.
Our results suggest that the behavior of pre-trained LLMs can be explained by cognitive science.
arXiv Detail & Related papers (2024-06-17T08:59:04Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Can Large Language Models Identify Authorship? [16.35265384114857]
Large Language Models (LLMs) have demonstrated an exceptional capacity for reasoning and problem-solving.
This work seeks to address three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively?
(2) Are LLMs capable of accurately attributing authorship among multiple candidates authors (e.g., 10 and 20)?
arXiv Detail & Related papers (2024-03-13T03:22:02Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z) - Can Large Language Models emulate an inductive Thematic Analysis of
semi-structured interviews? An exploration and provocation on the limits of
the approach and the model [0.0]
The paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis.
The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.
arXiv Detail & Related papers (2023-05-22T13:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.