Related papers: Why Cannot Large Language Models Ever Make True Correct Reasoning?

Related papers

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go [74.28228642327726]
Large language models (LLMs) have demonstrated exceptional performance in reasoning tasks such as mathematics and coding.<n>LoGos is a powerful LLM that not only maintains outstanding general reasoning abilities, but also conducts Go gameplay in natural language.<n>LoGos achieves performance comparable to human professional players, substantially surpassing all existing LLMs.
arXiv Detail & Related papers (2026-01-23T05:00:49Z)
Do Large Language Models Know What They Are Capable Of? [1.4254497466846006]
We investigate whether large language models (LLMs) can predict whether they will succeed on a given task.<n>We also investigate whether LLMs can learn from in-context experiences to make better decisions about whether to pursue a task in scenarios where failure is costly.
arXiv Detail & Related papers (2025-12-31T06:14:46Z)
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models [50.16340812031201]
We show that large language models (LLMs) do not update their beliefs as expected from the Bayesian framework.<n>We teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model.
arXiv Detail & Related papers (2025-03-21T20:13:04Z)
LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning about Actions [7.575628120822444]
"LLM+AL" is a method that bridges the natural language understanding capabilities of LLMs with the symbolic reasoning strengths of action languages.<n>We compare "LLM+AL" against state-of-the-art LLMs, including ChatGPT-4, Claude 3 Opus, Gemini Ultra 1.0, and o1-preview.<n>Our findings indicate that, although all methods exhibit errors, LLM+AL, with relatively minimal human corrections, consistently leads to correct answers.
arXiv Detail & Related papers (2025-01-01T13:20:01Z)
LLMs' Understanding of Natural Language Revealed [0.0]
Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. We will focus on testing LLMs for their language understanding capabilities, their supposed forte.
arXiv Detail & Related papers (2024-07-29T01:21:11Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
Caveat Lector: Large Language Models in Legal Practice [0.0]
The fascination with large language models derives from the fact that many users lack the expertise to evaluate the quality of the generated text. The dangerous combination of fluency and superficial plausibility leads to the temptation to trust the generated text and creates the risk of overreliance.
arXiv Detail & Related papers (2024-03-14T08:19:41Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation [66.01754585188739]
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence.
arXiv Detail & Related papers (2024-02-18T04:57:19Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents. We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z)
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding [1.3654846342364308]
Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. This position paper critically assesses three points recurring in critiques of LLM capacities. We outline a pragmatic perspective on the issue of real' understanding and intentionality in LLMs.
arXiv Detail & Related papers (2023-10-30T15:51:04Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong [35.64962031447787]
Large Language Models (LLMs) are increasingly used for accessing information on the web. Our experiments with 80 crowdworkers compare language models with search engines (information retrieval systems) at facilitating fact-checking. Users reading LLM explanations are significantly more efficient than those using search engines while achieving similar accuracy.
arXiv Detail & Related papers (2023-10-19T08:09:58Z)
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years. Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear. In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z)
Event knowledge in large language models: the gap between the impossible and the unlikely [46.540380831486125]
We show that pre-trained large language models (LLMs) possess substantial event knowledge. They almost always assign higher likelihood to possible vs. impossible events. However, they show less consistent preferences for likely vs. unlikely events.
arXiv Detail & Related papers (2022-12-02T23:43:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.