Related papers: From Logic to Language: A Trust Index for Problem Solving with LLMs

From Logic to Language: A Trust Index for Problem Solving with LLMs

URL: http://arxiv.org/abs/2507.16028v1
Date: Mon, 21 Jul 2025 19:50:45 GMT
Title: From Logic to Language: A Trust Index for Problem Solving with LLMs
Authors: Tehseen Rug, Felix Böhmer, Tessa Pfattheicher,
Abstract summary: This paper introduces a unified framework to understand and contrast Large Language Models (LLMs)<n>We define and delineate the problem spaces addressable by formal languages versus natural language.<n>We therefore introduce a vector-valued trust index Q, which reflects solution quality and distinguishes the binary correctness of formal solutions from the continuous adequacy spectrum characteristic of natural language solutions.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Classical computation, grounded in formal, logical systems, has been the engine of technological progress for decades, excelling at problems that can be described with unambiguous rules. This paradigm, however, leaves a vast ocean of human problems -- those characterized by ambiguity, dynamic environments, and subjective context -- largely untouched. The advent of Large Language Models (LLMs) represents a fundamental shift, enabling computational systems to engage with this previously inaccessible domain using natural language. This paper introduces a unified framework to understand and contrast these problem-solving paradigms. We define and delineate the problem spaces addressable by formal languages versus natural language. While solutions to the former problem class can be evaluated using binary quality measures, the latter requires a much more nuanced definition of approximate solution space taking into account the vagueness, subjectivity and ambiguity inherent to natural language. We therefore introduce a vector-valued trust index Q, which reflects solution quality and distinguishes the binary correctness of formal solutions from the continuous adequacy spectrum characteristic of natural language solutions. Within this framework, we propose two statistical quality dimensions. Normalized bi-semantic entropy measures robustness and conceptual diversity of LLM answers given semantic variation in problem formulations. Emotional valence maps subjective valuation of a solution to a quantifiable metric that can be maximized by invoking statistical measures. The concepts introduced in this work will provide a more rigorous understanding of the capabilities, limitations, and inherent nature of problem-solving in the age of LLMs.

Related papers

Learning to Reason via Mixture-of-Thought for Logical Reasoning [56.24256916896427]
Mixture-of-Thought (MoT) is a framework that enables LLMs to reason across three complementary modalities: natural language, code, and truth-table.<n>MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions.
arXiv Detail & Related papers (2025-05-21T17:59:54Z)
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks.<n>Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains.<n>This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z)
RM-PoT: Reformulating Mathematical Problems and Solving via Program of Thoughts [13.07180561863778]
We propose a three-stage framework that integrates problem reformulation (RM), code-aided reasoning (PoT) and domain-aware few-shot learning.<n>Our approach first reformulates the input problem into diverse surface forms to reduce structural bias, then retrieves five semantically aligned examples to provide contextual guidance.
arXiv Detail & Related papers (2025-02-18T06:54:32Z)
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning [8.20398036986024]
We introduce IOLBENCH, a novel benchmark derived from International Linguistics Olympiad (IOL) problems.<n>This dataset encompasses diverse problems testing syntax, morphology, phonology, and semantics.<n>We find that even the most advanced models struggle to handle the intricacies of linguistic complexity.
arXiv Detail & Related papers (2025-01-08T03:15:10Z)
VC Search: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning [46.25056744404318]
We develop a benchmark called Problems with Missing and Contradictory conditions ( PMC) containing over 5,000 validated ill-defined mathematical problems.<n>VCSEARCH improves the accuracy of identifying unsolvable problems by at least 12% across different large language models.
arXiv Detail & Related papers (2024-06-07T16:24:12Z)
Safe Multi-agent Reinforcement Learning with Natural Language Constraints [49.01100552946231]
The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. We propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL) Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings. These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards.
arXiv Detail & Related papers (2024-05-30T12:57:35Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z)
Eliciting Problem Specifications via Large Language Models [4.055489363682198]
Large language models (LLMs) can be utilized to map a problem class into a semi-formal specification. A cognitive system can then use the problem-space specification to solve multiple instances of problems from the problem class.
arXiv Detail & Related papers (2024-05-20T16:19:02Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT [72.83383437501577]
Large language models (LLMs) have recently demonstrated significant potential in mathematical abilities. LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities. This paper presents a novel method for integrating LLMs into the abductive learning framework.
arXiv Detail & Related papers (2023-04-21T16:23:47Z)
Perceptual reasoning based solution methodology for linguistic optimization problems [13.548237279353408]
linguistic optimization problems (LOPs) are of two types, single objective linguistic optimization problems (SOLOPs) and multi-objective linguistic optimization problems (MOLOPs) The use of linguistic information inevitably calls for the utilization of computing with words (CWW), and therefore, 2-tuple linguistic model based solution methodologies were proposed for LOPs. We found that 2-tuple linguistic model based solution methodologies represent the semantics of the linguistic information using a combination of type-1 fuzzy sets and ordinal term sets.
arXiv Detail & Related papers (2020-04-30T16:35:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.