Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models
- URL: http://arxiv.org/abs/2508.17153v1
- Date: Sat, 23 Aug 2025 22:19:16 GMT
- Title: Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models
- Authors: Tharindu Madusanka, Ian Pratt-Hartmann, Riza Batista-Navarro,
- Abstract summary: We show that satisfiability problems vary along various dimensions, which may affect TLMs' ability to learn how to solve them.<n>We investigate how problem instances from varying computational complexity classes and having different grammatical constructs impact TLMs' ability to learn rules of inference.
- Score: 15.8088859393756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efforts to apply transformer-based language models (TLMs) to the problem of reasoning in natural language have enjoyed ever-increasing success in recent years. The most fundamental task in this area to which nearly all others can be reduced is that of determining satisfiability. However, from a logical point of view, satisfiability problems vary along various dimensions, which may affect TLMs' ability to learn how to solve them. The problem instances of satisfiability in natural language can belong to different computational complexity classes depending on the language fragment in which they are expressed. Although prior research has explored the problem of natural language satisfiability, the above-mentioned point has not been discussed adequately. Hence, we investigate how problem instances from varying computational complexity classes and having different grammatical constructs impact TLMs' ability to learn rules of inference. Furthermore, to faithfully evaluate TLMs, we conduct an empirical study to explore the distribution of satisfiability problems.
Related papers
- Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil [1.0499611180329804]
Large language models (LLMs) demonstrate strong mathematical reasoning in English.<n>But whether these capabilities reflect genuine multilingual reasoning or reliance on translation-based processing in low-resource languages like Sinhala and Tamil remains unclear.<n>We evaluate four prominent large language models using a taxonomy of six math problem types.
arXiv Detail & Related papers (2026-02-16T07:08:37Z) - From Logic to Language: A Trust Index for Problem Solving with LLMs [0.0]
This paper introduces a unified framework to understand and contrast Large Language Models (LLMs)<n>We define and delineate the problem spaces addressable by formal languages versus natural language.<n>We therefore introduce a vector-valued trust index Q, which reflects solution quality and distinguishes the binary correctness of formal solutions from the continuous adequacy spectrum characteristic of natural language solutions.
arXiv Detail & Related papers (2025-07-21T19:50:45Z) - A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems [64.05451567422342]
We introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard problems expressed in natural language.<n>EHOP includes problem formulations that could be found in computer science textbooks (e.g., graph coloring), versions that are dressed up as problems that could arise in real life.<n>We find that state-of-the-art LLMs, across multiple prompting strategies, solve textbook problems more accurately than their real-life and inverted counterparts.
arXiv Detail & Related papers (2025-02-19T14:39:59Z) - Evaluating LLMs' Mathematical Reasoning in Financial Document Question
Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets.
We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps.
We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z) - How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study [59.13867562744973]
This work systematically assesses LMs' capabilities for out-of-distribution (OOD) scenarios.
We find that the efficacy of such learning paradigms varies with the type of OOD.
Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts.
arXiv Detail & Related papers (2023-09-15T11:15:47Z) - Evaluating Transformer's Ability to Learn Mildly Context-Sensitive
Languages [6.227678387562755]
Recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages.
We test the Transformer's ability to learn mildly context-sensitive languages of varying complexities.
Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior.
arXiv Detail & Related papers (2023-09-02T08:17:29Z) - ChatABL: Abductive Learning via Natural Language Interaction with
ChatGPT [72.83383437501577]
Large language models (LLMs) have recently demonstrated significant potential in mathematical abilities.
LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities.
This paper presents a novel method for integrating LLMs into the abductive learning framework.
arXiv Detail & Related papers (2023-04-21T16:23:47Z) - Can Transformers Reason in Fragments of Natural Language? [2.1485350418225244]
State-of-the-art deep-learning-based approaches to Natural Language Processing (NLP) are credited with various capabilities that involve reasoning with natural language texts.
We study the detection of formally valid inferences in controlled fragments of natural language for which the satisfiability problem becomes increasingly complex.
arXiv Detail & Related papers (2022-11-10T08:46:53Z) - Pushing the Limits of Rule Reasoning in Transformers through Natural
Language Satisfiability [30.01308882849197]
We propose a new methodology for creating challenging algorithmic reasoning datasets.
Key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language.
We find that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems.
arXiv Detail & Related papers (2021-12-16T17:47:20Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.