Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility
- URL: http://arxiv.org/abs/2503.17579v1
- Date: Fri, 21 Mar 2025 23:25:42 GMT
- Title: Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility
- Authors: Suet-Ying Lam, Qingcheng Zeng, Jingyi Wu, Rob Voigt,
- Abstract summary: We examine whether large language models process language similarly to humans.<n>We find that some LLMs do quantitatively and qualitatively reflect human-like asymmetries between production and interpretation.
- Score: 7.183662547358301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whether large language models (LLMs) process language similarly to humans has been the subject of much theoretical and practical debate. We examine this question through the lens of the production-interpretation distinction found in human sentence processing and evaluate the extent to which instruction-tuned LLMs replicate this distinction. Using an empirically documented asymmetry between production and interpretation in humans for implicit causality verbs as a testbed, we find that some LLMs do quantitatively and qualitatively reflect human-like asymmetries between production and interpretation. We demonstrate that whether this behavior holds depends upon both model size - with larger models more likely to reflect human-like patterns and the choice of meta-linguistic prompts used to elicit the behavior.
Related papers
- Turing Representational Similarity Analysis (RSA): A Flexible Method for Measuring Alignment Between Human and Artificial Intelligence [0.62914438169038]
We developed Turing Representational Similarity Analysis (RSA), a method that uses pairwise similarity ratings to quantify alignment between AIs and humans.
We tested this approach on semantic alignment across text and image modalities, measuring how different Large Language and Vision Language Model (LLM and VLM) similarity judgments aligned with human responses at both group and individual levels.
arXiv Detail & Related papers (2024-11-30T20:24:52Z) - Uncovering Factor Level Preferences to Improve Human-Model Alignment [58.50191593880829]
We introduce PROFILE, a framework that uncovers and quantifies the influence of specific factors driving preferences.
ProFILE's factor level analysis explains the 'why' behind human-model alignment and misalignment.
We demonstrate how leveraging factor level insights, including addressing misaligned factors, can improve alignment with human preferences.
arXiv Detail & Related papers (2024-10-09T15:02:34Z) - HLB: Benchmarking LLMs' Humanlikeness in Language Use [2.438748974410787]
We present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs)
We collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments.
Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels.
arXiv Detail & Related papers (2024-09-24T09:02:28Z) - LLMs as Models for Analogical Reasoning [14.412456982731467]
Analogical reasoning is fundamental to human cognition and learning.<n>Recent studies have shown that large language models can sometimes match humans in analogical reasoning tasks.
arXiv Detail & Related papers (2024-06-19T20:07:37Z) - A Survey on Human Preference Learning for Large Language Models [81.41868485811625]
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning.
This survey covers the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
arXiv Detail & Related papers (2024-06-17T03:52:51Z) - Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality [11.660251022962141]
We investigate the validity of conventional personality questionnaires in capturing the human-like personality traits of Large Language Models (LLMs)<n>Our objective is to assess the congruence between the personality traits LLMs claim to possess and their demonstrated tendencies in real-world scenarios.
arXiv Detail & Related papers (2024-02-22T16:32:08Z) - CausalGym: Benchmarking causal interpretability methods on linguistic
tasks [52.61917615039112]
We use CausalGym to benchmark the ability of interpretability methods to causally affect model behaviour.
We study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods.
We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena.
arXiv Detail & Related papers (2024-02-19T21:35:56Z) - A Theory of LLM Sampling: Part Descriptive and Part Prescriptive [53.08398658452411]
Large Language Models (LLMs) are increasingly utilized in autonomous decision-making.
We show that this sampling behavior resembles that of human decision-making.
We show that this deviation of a sample from the statistical norm towards a prescriptive component consistently appears in concepts across diverse real-world domains.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Divergences between Language Models and Human Brains [59.100552839650774]
We systematically explore the divergences between human and machine language processing.<n>We identify two domains that LMs do not capture well: social/emotional intelligence and physical commonsense.<n>Our results show that fine-tuning LMs on these domains can improve their alignment with human brain responses.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Tokenization Preference for Human and Machine Learning Model: An
Annotation Study [6.399914034380356]
This study examines the relations between preferred tokenization for humans and one for machine-learning (ML) models.
We analyze relations among performance of answers by human and ML model, the appropriateness of tokenization for human, and response time to questions by human.
arXiv Detail & Related papers (2023-04-21T08:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.