Using AI to replicate human experimental results: a motion study
- URL: http://arxiv.org/abs/2507.10342v1
- Date: Mon, 14 Jul 2025 14:47:01 GMT
- Title: Using AI to replicate human experimental results: a motion study
- Authors: Rosa Illan Castillo, Javier Valenzuela,
- Abstract summary: This paper explores the potential of large language models (LLMs) as reliable analytical tools in linguistic research.<n>It focuses on the emergence of affective meanings in temporal expressions involving manner-of-motion verbs.
- Score: 0.11838866556981258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores the potential of large language models (LLMs) as reliable analytical tools in linguistic research, focusing on the emergence of affective meanings in temporal expressions involving manner-of-motion verbs. While LLMs like GPT-4 have shown promise across a range of tasks, their ability to replicate nuanced human judgements remains under scrutiny. We conducted four psycholinguistic studies (on emergent meanings, valence shifts, verb choice in emotional contexts, and sentence-emoji associations) first with human participants and then replicated the same tasks using an LLM. Results across all studies show a striking convergence between human and AI responses, with statistical analyses (e.g., Spearman's rho = .73-.96) indicating strong correlations in both rating patterns and categorical choices. While minor divergences were observed in some cases, these did not alter the overall interpretative outcomes. These findings offer compelling evidence that LLMs can augment traditional human-based experimentation, enabling broader-scale studies without compromising interpretative validity. This convergence not only strengthens the empirical foundation of prior human-based findings but also opens possibilities for hypothesis generation and data expansion through AI. Ultimately, our study supports the use of LLMs as credible and informative collaborators in linguistic inquiry.
Related papers
- Word Overuse and Alignment in Large Language Models: The Influence of Learning from Human Feedback [0.0]
Large Language Models (LLMs) are known to overuse certain terms like "delve" and "intricate"<n>This study investigates the contribution of Learning from Human Feedback (LHF)<n>We more conclusively link LHF to lexical overuse by experimentally emulating the LHF procedure.
arXiv Detail & Related papers (2025-08-03T21:45:37Z) - How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs [13.822169295436177]
We investigate how large language models (LLMs) process the temporal meaning of linguistic aspect in narratives that were previously used in human studies.<n>Our findings show that LLMs over-rely on prototypicality, produce inconsistent aspectual judgments, and struggle with causal reasoning derived from aspect.<n>These results suggest that LLMs process aspect fundamentally differently from humans and lack robust narrative understanding.
arXiv Detail & Related papers (2025-07-18T18:28:35Z) - Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - Judgment of Learning: A Human Ability Beyond Generative Artificial Intelligence [0.0]
Large language models (LLMs) increasingly mimic human cognition in various language-based tasks.<n>We introduce a cross-agent prediction model to assess whether ChatGPT-based LLMs align with human judgments of learning (JOL)<n>Our results revealed that while human JOL reliably predicted actual memory performance, none of the tested LLMs demonstrated comparable predictive accuracy.
arXiv Detail & Related papers (2024-10-17T09:42:30Z) - The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - Can Language Models Recognize Convincing Arguments? [12.458437450959416]
Large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives.
We study their performance in detecting convincing arguments to gain insights into their persuasive capabilities.
arXiv Detail & Related papers (2024-03-31T17:38:33Z) - CausalGym: Benchmarking causal interpretability methods on linguistic
tasks [52.61917615039112]
We use CausalGym to benchmark the ability of interpretability methods to causally affect model behaviour.
We study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods.
We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena.
arXiv Detail & Related papers (2024-02-19T21:35:56Z) - Six Fallacies in Substituting Large Language Models for Human Participants [0.0]
Can AI systems like large language models (LLMs) replace human participants in behavioral and psychological research?<n>Here I critically evaluate the "replacement" perspective and identify six interpretive fallacies that undermine its validity.<n>Each fallacy represents a potential misunderstanding about what LLMs are and what they can tell us about human cognition.
arXiv Detail & Related papers (2024-02-06T23:28:23Z) - Zero-shot Causal Graph Extrapolation from Text via LLMs [50.596179963913045]
We evaluate the ability of large language models (LLMs) to infer causal relations from natural language.
LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples.
We extend our approach to extrapolating causal graphs through iterated pairwise queries.
arXiv Detail & Related papers (2023-12-22T13:14:38Z) - The Adoption and Efficacy of Large Language Models: Evidence From Consumer Complaints in the Financial Industry [2.300664273021602]
This research explores the effect of Large Language Models (LLMs) on consumer complaints submitted to the Consumer Financial Protection Bureau from 2015 to 2024.<n>We find that LLM usage is associated with an increased likelihood of obtaining relief from financial firms.
arXiv Detail & Related papers (2023-11-28T04:07:34Z) - From Heuristic to Analytic: Cognitively Motivated Strategies for
Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions.
Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z) - Framework-Based Qualitative Analysis of Free Responses of Large Language
Models: Algorithmic Fidelity [1.7947441434255664]
Large-scale generative Language Models (LLMs) can simulate free responses to interview questions like those traditionally analyzed using qualitative research methods.
Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods.
arXiv Detail & Related papers (2023-09-06T15:00:44Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.<n>We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.