Related papers: Running cognitive evaluations on large language models: The do's and the don'ts

Running cognitive evaluations on large language models: The do's and the don'ts

URL: http://arxiv.org/abs/2312.01276v1
Date: Sun, 3 Dec 2023 04:28:19 GMT
Title: Running cognitive evaluations on large language models: The do's and the don'ts
Authors: Anna A. Ivanova
Abstract summary: I describe methodological considerations for studies that aim to evaluate the cognitive capacities of large language models. I list 10 do's and don'ts that should help design high-quality cognitive evaluations for AI systems.
Score: 3.8073142980733
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, I describe methodological considerations for studies that aim to evaluate the cognitive capacities of large language models (LLMs) using language-based behavioral assessments. Drawing on three case studies from the literature (a commonsense knowledge benchmark, a theory of mind evaluation, and a test of syntactic agreement), I describe common pitfalls that might arise when applying a cognitive test to an LLM. I then list 10 do's and don'ts that should help design high-quality cognitive evaluations for AI systems. I conclude by discussing four areas where the do's and don'ts are currently under active discussion -- prompt sensitivity, cultural and linguistic diversity, using LLMs as research assistants, and running evaluations on open vs. closed LLMs. Overall, the goal of the paper is to contribute to the broader discussion of best practices in the rapidly growing field of AI Psychology.

Related papers

The Incomplete Bridge: How AI Research (Mis)Engages with Psychology [30.36064725942852]
Social sciences have accumulated a rich body of theories and methodologies for investigating the human mind and behaviors.<n> Focusing on psychology as a prominent case, this study explores the interdisciplinary synergy between AI and the field.<n>We identify key patterns of interdisciplinary integration, locate the psychology domains most frequently referenced, and highlight areas that remain underexplored.
arXiv Detail & Related papers (2025-07-30T17:03:59Z)
A Computational Framework to Identify Self-Aspects in Text [9.187473897664105]
The Self is a multifaceted construct and it is reflected in language.<n>Many of the aspects of the Self align with psychological and other well-researched phenomena.<n>This proposal introduces a plan to develop a computational framework to identify Self-aspects in text.
arXiv Detail & Related papers (2025-07-17T13:31:04Z)
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis [46.08309259203833]
We argue that methods developed in cognitive science can be useful for understanding large language models. We propose a framework for applying these methods based on Marr's three levels of analysis.
arXiv Detail & Related papers (2025-03-17T17:33:54Z)
Enhancing Human-Like Responses in Large Language Models [0.0]
We focus on techniques that enhance natural language understanding, conversational coherence, and emotional intelligence in AI systems. The study evaluates various approaches, including fine-tuning with diverse datasets, incorporating psychological principles, and designing models that better mimic human reasoning patterns.
arXiv Detail & Related papers (2025-01-09T07:44:06Z)
How Performance Pressure Influences AI-Assisted Decision Making [57.53469908423318]
We show how pressure and explainable AI (XAI) techniques interact with AI advice-taking behavior. Our results show complex interaction effects, with different combinations of pressure and XAI techniques either improving or worsening AI advice taking behavior.
arXiv Detail & Related papers (2024-10-21T22:39:52Z)
Introducing ELLIPS: An Ethics-Centered Approach to Research on LLM-Based Inference of Psychiatric Conditions [0.6174527525452624]
This paper charts the ethical landscape of research on language-based inference of psychopathology. We identify seven core ethical principles that should guide model development and deployment. We translate these principles into questions that can guide researchers' choices.
arXiv Detail & Related papers (2024-09-06T12:27:38Z)
Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support [0.0]
Two large language models, GPT-4 and Chat-GPT, were tested in responding to a set of 18 psychological prompts. GPT-4 achieved an average rating of 8.29 out of 10, while Chat-GPT received an average rating of 6.52.
arXiv Detail & Related papers (2024-05-15T12:44:54Z)
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z)
Analyzing Character and Consciousness in AI-Generated Social Content: A Case Study of Chirper, the AI Social Network [0.0]
The study embarks on a comprehensive exploration of AI behavior, analyzing the effects of diverse settings on Chirper's responses. Through a series of cognitive tests, the study gauges the self-awareness and pattern recognition prowess of Chirpers. An intriguing aspect of the research is the exploration of the potential influence of a Chirper's handle or personality type on its performance.
arXiv Detail & Related papers (2023-08-30T15:40:18Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)
Language Cognition and Language Computation -- Human and Machine Language Understanding [51.56546543716759]
Language understanding is a key scientific issue in the fields of cognitive and computer science. Can a combination of the disciplines offer new insights for building intelligent language models?
arXiv Detail & Related papers (2023-01-12T02:37:00Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
Towards Human-centered Explainable AI: A Survey of User Studies for Model Explanations [18.971689499890363]
We identify and analyze 97core papers with human-based XAI evaluations over the past five years. Our research shows that XAI is spreading more rapidly in certain application domains, such as recommender systems. We propose practical guidelines on designing and conducting user studies for XAI researchers and practitioners.
arXiv Detail & Related papers (2022-10-20T20:53:00Z)
Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli. Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z)
Human Evaluation of Interpretability: The Case of AI-Generated Music Knowledge [19.508678969335882]
We focus on evaluating AI-discovered knowledge/rules in the arts and humanities. We present an experimental procedure to collect and assess human-generated verbal interpretations of AI-generated music theory/rules rendered as sophisticated symbolic/numeric objects.
arXiv Detail & Related papers (2020-04-15T06:03:34Z)
A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented. This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.