Can Transformer Language Models Predict Psychometric Properties?
- URL: http://arxiv.org/abs/2106.06849v1
- Date: Sat, 12 Jun 2021 20:05:33 GMT
- Title: Can Transformer Language Models Predict Psychometric Properties?
- Authors: Antonio Laverghetta Jr., Animesh Nighojkar, Jamshidbek Mirzakhalov and
John Licato
- Abstract summary: Transformer-based language models (LMs) continue to advance state-of-the-art performance on NLP benchmark tasks.
Can LMs be of use in predicting what the psychometric properties of test items will be when those items are given to human participants?
We gather responses from numerous human participants and LMs on a broad diagnostic test of linguistic competencies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models (LMs) continue to advance state-of-the-art
performance on NLP benchmark tasks, including tasks designed to mimic
human-inspired "commonsense" competencies. To better understand the degree to
which LMs can be said to have certain linguistic reasoning skills, researchers
are beginning to adapt the tools and concepts of the field of psychometrics.
But to what extent can the benefits flow in the other direction? I.e., can LMs
be of use in predicting what the psychometric properties of test items will be
when those items are given to human participants? We gather responses from
numerous human participants and LMs (transformer and non-transformer-based) on
a broad diagnostic test of linguistic competencies. We then use the responses
to calculate standard psychometric properties of the items in the diagnostic
test, using the human responses and the LM responses separately. We then
determine how well these two sets of predictions match. We find cases in which
transformer-based LMs predict psychometric properties consistently well in
certain categories but consistently poorly in others, thus providing new
insights into fundamental similarities and differences between human and LM
reasoning.
Related papers
- LMLPA: Language Model Linguistic Personality Assessment [11.599282127259736]
Large Language Models (LLMs) are increasingly used in everyday life and research.
measuring the personality of a given LLM is currently a challenge.
This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs.
arXiv Detail & Related papers (2024-10-23T07:48:51Z) - Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models [41.324679754114165]
Language models (LMs) are increasingly used to simulate human-like responses in scenarios where accurately mimicking a population's behavior can guide decision-making.
We introduce "psychometric alignment," a metric that measures the extent to which LMs reflect human knowledge distribution.
We find significant misalignment between LMs and human populations, though using persona-based prompts can improve alignment.
arXiv Detail & Related papers (2024-07-22T14:02:59Z) - Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences [1.942809872918085]
We revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times.
We investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times.
Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times.
arXiv Detail & Related papers (2024-06-07T14:54:56Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Divergences between Language Models and Human Brains [63.405788999891335]
Recent research has hinted that brain signals can be effectively predicted using internal representations of language models (LMs)
We show that there are clear differences in how LMs and humans represent and use language.
We identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z) - Predicting Human Psychometric Properties Using Computational Language
Models [5.806723407090421]
Transformer-based language models (LMs) continue to achieve state-of-the-art performance on natural language processing (NLP) benchmarks.
Can LMs be of use in predicting the psychometric properties of test items, when those items are given to human participants?
We gather responses from numerous human participants and LMs on a broad diagnostic test of linguistic competencies.
We then use the human responses to calculate standard psychometric properties of the items in the diagnostic test, using the human responses and the LM responses separately.
arXiv Detail & Related papers (2022-05-12T16:40:12Z) - Do language models learn typicality judgments from text? [6.252236971703546]
We evaluate predictive language models (LMs) on a prevalent phenomenon in cognitive science: typicality.
Our first test targets whether typicality modulates LMs in assigning taxonomic category memberships to items.
The second test investigates sensitivities to typicality in LMs' probabilities when extending new information about items to their categories.
arXiv Detail & Related papers (2021-05-06T21:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.