Related papers: A Modular Dataset to Demonstrate LLM Abstraction Capability

Related papers

Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z)
Framework for Machine Evaluation of Reasoning Completeness in Large Language Models For Classification Tasks [0.0]
This paper introduces RACE-Reasoning Alignment for Completeness of Explanations.<n>We analyze four widely used text classification datasets-WIKI ONTOLOGY, AG NEWS, IMDB, and GOEMOTIONS.<n>We show that correct predictions exhibit higher coverage of supporting features, while incorrect predictions are associated with elevated coverage of contradicting features.
arXiv Detail & Related papers (2025-10-23T20:22:22Z)
WakenLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking [14.76224690767612]
Large Language Models (LLMs) frequently output the label Unknown in reasoning tasks.<n>We introduce WakenLLM, a framework that quantifies the portion of Unknown output attributable to model incapacity.
arXiv Detail & Related papers (2025-07-22T03:21:48Z)
Do LLMs Dream of Discrete Algorithms? [0.7646713951724011]
Large Language Models (LLMs) have rapidly transformed the landscape of artificial intelligence.<n>Their reliance on probabilistic inference limits their effectiveness in domains requiring strict logical reasoning.<n>This paper proposes a neurosymbolic approach that augments LLMs with logic-based reasoning modules.
arXiv Detail & Related papers (2025-06-29T22:03:01Z)
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning [95.44766931218896]
Multi-modal large language models (MLLMs) still lag behind text-based reasoning.<n>We introduce Perception-Reasoning Decoupling, which modularizes the MLLM's reasoning component and makes it easily replaceable.<n>We propose a novel reinforcement learning algorithm called Visual Perception Optimization (VPO) to align the MLLM's perceptual output with the final reasoning task.
arXiv Detail & Related papers (2025-06-05T02:28:07Z)
Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness [3.9930400744726273]
We design a novel evaluation framework, MATCHA, to investigate the relationship between answer and reasoning.<n>In domains like education and healthcare, reasoning is key for model trustworthiness.<n>Our results show that LLMs exhibit greater vulnerability to input perturbations for multi-step and commonsense tasks than compared to logical tasks.
arXiv Detail & Related papers (2025-05-23T02:42:16Z)
Guiding Reasoning in Small Language Models with LLM Assistance [23.3038074903744]
Small Language Models cast doubt suitability for tasks demanding deep, multi-step logical deduction. This paper introduces a framework called Small Reasons, Large Hints, which selectively augments SLM reasoning with targeted guidance from large language models. Our experiments on mathematical reasoning datasets demonstrate that targeted external scaffolding significantly improves performance.
arXiv Detail & Related papers (2025-04-14T06:32:45Z)
Enhancing Large Language Models through Neuro-Symbolic Integration and Ontological Reasoning [0.0]
Large Language Models (LLMs) demonstrate impressive capabilities in natural language processing but suffer from inaccuracies and logical inconsistencies known as hallucinations. We propose a neuro-symbolic approach integrating symbolic ontological reasoning and machine learning methods to enhance the consistency and reliability of LLM outputs.
arXiv Detail & Related papers (2025-04-10T10:39:24Z)
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z)
Logic-Enhanced Language Model Agents for Trustworthy Social Simulations [3.5083201638203154]
This study focuses on decision-making in game-theoretic scenarios as a model of human interaction. We introduce the Logic-Enhanced Language Model Agents (LELMA) framework, a novel approach to enhance the trustworthiness of social simulations.
arXiv Detail & Related papers (2024-08-28T18:25:35Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
Misinforming LLMs: vulnerabilities, challenges and opportunities [4.54019093815234]
Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. This paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. Research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs.
arXiv Detail & Related papers (2024-08-02T10:35:49Z)
Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
DiLA: Enhancing LLM Tool Learning with Differential Logic Layer [11.810200077863172]
We propose a novel differential logic layer-aided language modeling (DiLA) approach, where logical constraints are integrated into the forward and backward passes of a network layer. We evaluate the performance of DiLA on two classic reasoning problems and empirically demonstrate its consistent outperformance against existing prompt-based and solver-aided approaches.
arXiv Detail & Related papers (2024-02-19T07:38:57Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding [10.421832675327712]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.<n> Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.