Predicting Human Psychometric Properties Using Computational Language
Models
- URL: http://arxiv.org/abs/2205.06203v1
- Date: Thu, 12 May 2022 16:40:12 GMT
- Title: Predicting Human Psychometric Properties Using Computational Language
Models
- Authors: Antonio Laverghetta Jr., Animesh Nighojkar, Jamshidbek Mirzakhalov,
John Licato
- Abstract summary: Transformer-based language models (LMs) continue to achieve state-of-the-art performance on natural language processing (NLP) benchmarks.
Can LMs be of use in predicting the psychometric properties of test items, when those items are given to human participants?
We gather responses from numerous human participants and LMs on a broad diagnostic test of linguistic competencies.
We then use the human responses to calculate standard psychometric properties of the items in the diagnostic test, using the human responses and the LM responses separately.
- Score: 5.806723407090421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models (LMs) continue to achieve state-of-the-art
performance on natural language processing (NLP) benchmarks, including tasks
designed to mimic human-inspired "commonsense" competencies. To better
understand the degree to which LMs can be said to have certain linguistic
reasoning skills, researchers are beginning to adapt the tools and concepts
from psychometrics. But to what extent can benefits flow in the other
direction? In other words, can LMs be of use in predicting the psychometric
properties of test items, when those items are given to human participants? If
so, the benefit for psychometric practitioners is enormous, as it can reduce
the need for multiple rounds of empirical testing. We gather responses from
numerous human participants and LMs (transformer- and non-transformer-based) on
a broad diagnostic test of linguistic competencies. We then use the human
responses to calculate standard psychometric properties of the items in the
diagnostic test, using the human responses and the LM responses separately. We
then determine how well these two sets of predictions correlate. We find that
transformer-based LMs predict the human psychometric data consistently well
across most categories, suggesting that they can be used to gather human-like
psychometric data without the need for extensive human trials.
Related papers
- Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models [41.324679754114165]
Language models (LMs) are increasingly used to simulate human-like responses in scenarios where accurately mimicking a population's behavior can guide decision-making.
We introduce "psychometric alignment," a metric that measures the extent to which LMs reflect human knowledge distribution.
We find significant misalignment between LMs and human populations, though using persona-based prompts can improve alignment.
arXiv Detail & Related papers (2024-07-22T14:02:59Z) - Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences [1.942809872918085]
We revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times.
We investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times.
Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times.
arXiv Detail & Related papers (2024-06-07T14:54:56Z) - Large Language Models for Psycholinguistic Plausibility Pretesting [47.1250032409564]
We investigate whether Language Models (LMs) can be used to generate plausibility judgements.
We find that GPT-4 plausibility judgements highly correlate with human judgements across the structures we examine.
We then test whether this correlation implies that LMs can be used instead of humans for pretesting.
arXiv Detail & Related papers (2024-02-08T07:20:02Z) - Divergences between Language Models and Human Brains [63.405788999891335]
Recent research has hinted that brain signals can be effectively predicted using internal representations of language models (LMs)
We show that there are clear differences in how LMs and humans represent and use language.
We identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing
Perspective [63.92197404447808]
Large language models (LLMs) have shown some human-like cognitive abilities.
We propose an adaptive testing framework for LLM evaluation.
This approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Revisiting the Reliability of Psychological Scales on Large Language
Models [66.31055885857062]
This study aims to determine the reliability of applying personality assessments to Large Language Models (LLMs)
By shedding light on the personalization of LLMs, our study endeavors to pave the way for future explorations in this field.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors
of Language Models in Human-Machine Conversations [26.59671463642373]
We consider how demographic factors in LM language skills can be measured to determine compatibility with a target demographic.
We suggest clinical techniques from Speech Language Pathology, which has norms for acquisition of language skills in humans.
We conduct evaluation with a domain expert (i.e., a clinically licensed speech language pathologist) and also propose automated techniques to complement clinical evaluation at scale.
arXiv Detail & Related papers (2023-05-23T16:15:24Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z) - Can Transformer Language Models Predict Psychometric Properties? [0.0]
Transformer-based language models (LMs) continue to advance state-of-the-art performance on NLP benchmark tasks.
Can LMs be of use in predicting what the psychometric properties of test items will be when those items are given to human participants?
We gather responses from numerous human participants and LMs on a broad diagnostic test of linguistic competencies.
arXiv Detail & Related papers (2021-06-12T20:05:33Z) - Constructing a Testbed for Psychometric Natural Language Processing [0.5801044612920815]
We describe our efforts to construct a corpus for psychometric natural language processing (NLP)
We discuss our multi-step process to align user text with their survey-based response items.
We report preliminary results on the use of the text to categorize/predict users' survey response labels.
arXiv Detail & Related papers (2020-07-25T16:29:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.