Humans Perceive Wrong Narratives from AI Reasoning Texts
- URL: http://arxiv.org/abs/2508.16599v2
- Date: Thu, 28 Aug 2025 11:53:23 GMT
- Title: Humans Perceive Wrong Narratives from AI Reasoning Texts
- Authors: Mosh Levy, Zohar Elyoseph, Yoav Goldberg,
- Abstract summary: A new generation of AI models generates step-by-step reasoning text before producing an answer.<n>It is unclear whether human understanding of this text matches the model's actual computational process.
- Score: 26.472074065985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A new generation of AI models generates step-by-step reasoning text before producing an answer. This text appears to offer a human-readable window into their computation process, and is increasingly relied upon for transparency and interpretability. However, it is unclear whether human understanding of this text matches the model's actual computational process. In this paper, we investigate a necessary condition for correspondence: the ability of humans to identify which steps in a reasoning text causally influence later steps. We evaluated humans on this ability by composing questions based on counterfactual measurements and found a significant discrepancy: participant accuracy was only 29%, barely above chance (25%), and remained low (42%) even when evaluating the majority vote on questions with high agreement. Our results reveal a fundamental gap between how humans interpret reasoning texts and how models use it, challenging its utility as a simple interpretability tool. We argue that reasoning texts should be treated as an artifact to be investigated, not taken at face value, and that understanding the non-human ways these models use language is a critical research direction.
Related papers
- Computational Turing Test Reveals Systematic Differences Between Human and AI Language [0.0]
Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior.<n>Existing validation efforts rely heavily on human-judgment-based evaluations.<n>This paper introduces a computational Turing test to assess how closely LLMs approximate human language.
arXiv Detail & Related papers (2025-11-06T08:56:37Z) - Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis [0.9898534984111934]
We developed an extraction platform using large language models (LLMs) to automate data extraction.<n>We compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review.<n>Findings suggest AI variability depends more on interpretability than hallucination.
arXiv Detail & Related papers (2025-08-13T03:33:30Z) - Scaling up the think-aloud method [63.91056664423141]
We develop methods to automate the transcription and annotation of verbal reports of reasoning using natural language processing tools.<n>In our study, 640 participants thought aloud while playing the Game of 24, a mathematical reasoning task.<n>Our work demonstrates the value of think-aloud data at scale and serves as a proof of concept for the automated analysis of verbal reports.
arXiv Detail & Related papers (2025-05-29T18:26:23Z) - Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI [95.81924314159943]
We find that major gaps between human and machine text lie in concreteness, cultural nuances, and diversity.<n>We also find that humans do not always prefer human-written text, particularly when they cannot clearly identify its source.
arXiv Detail & Related papers (2025-02-17T09:56:46Z) - ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.<n>We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.<n>We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z) - Trying to be human: Linguistic traces of stochastic empathy in language models [0.2638512174804417]
Large language models (LLMs) are crucial drivers behind the increased quality of computer-generated content.
Our work tests how two important factors contribute to the human vs AI race: empathy and an incentive to appear human.
arXiv Detail & Related papers (2024-10-02T15:46:40Z) - ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales? [7.307538454513983]
This study explores the alignment between ChatGPT and human assessments across multiple scales.
We sample 300 data instances from three NLE datasets and collect 900 human annotations for both informativeness and clarity scores.
Our results show that ChatGPT aligns better with humans in more coarse-grained scales.
arXiv Detail & Related papers (2024-03-26T04:07:08Z) - Natural Language Decompositions of Implicit Content Enable Better Text Representations [52.992875653864076]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.<n>We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.<n>Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Real or Fake Text?: Investigating Human Ability to Detect Boundaries
Between Human-Written and Machine-Generated Text [23.622347443796183]
We study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models.
We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time.
arXiv Detail & Related papers (2022-12-24T06:40:25Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.