Modeling Subjectivity in Cognitive Appraisal with Language Models
- URL: http://arxiv.org/abs/2503.11381v1
- Date: Fri, 14 Mar 2025 13:25:41 GMT
- Title: Modeling Subjectivity in Cognitive Appraisal with Language Models
- Authors: Yuxiang Zhou, Hainiu Xu, Desmond C. Ong, Petr Slovak, Yulan He,
- Abstract summary: We show how language models can harness subjectivity by conducting comprehensive experiments and analysis across various scenarios.<n>Our findings reveal that personality traits and demographical information are critical for measuring subjectivity.
- Score: 16.846297851557477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the utilization of language models in interdisciplinary, human-centered studies grow, the expectation of model capabilities continues to evolve. Beyond excelling at conventional tasks, models are recently expected to perform well on user-centric measurements involving confidence and human (dis)agreement -- factors that reflect subjective preferences. While modeling of subjectivity plays an essential role in cognitive science and has been extensively studied, it remains under-explored within the NLP community. In light of this gap, we explore how language models can harness subjectivity by conducting comprehensive experiments and analysis across various scenarios using both fine-tuned models and prompt-based large language models (LLMs). Our quantitative and qualitative experimental results indicate that existing post-hoc calibration approaches often fail to produce satisfactory results. However, our findings reveal that personality traits and demographical information are critical for measuring subjectivity. Furthermore, our in-depth analysis offers valuable insights for future research and development in the interdisciplinary studies of NLP and cognitive science.
Related papers
- The potential -- and the pitfalls -- of using pre-trained language models as cognitive science theories [2.6549754445378344]
We discuss challenges to the use of PLMs as cognitive science theories.
We review assumptions used by researchers to map measures of PLM performance to measures of human performance.
We end by enumerating criteria for using PLMs as credible accounts of cognition and cognitive development.
arXiv Detail & Related papers (2025-01-22T05:24:23Z) - Large Language Model for Qualitative Research -- A Systematic Mapping Study [3.302912592091359]
Large Language Models (LLMs), powered by advanced generative AI, have emerged as transformative tools.
This study systematically maps the literature on the use of LLMs for qualitative research.
Findings reveal that LLMs are utilized across diverse fields, demonstrating the potential to automate processes.
arXiv Detail & Related papers (2024-11-18T21:28:00Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - Lessons from the Trenches on Reproducible Evaluation of Language Models [60.522749986793094]
We draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers.
We present the Language Model Evaluation Harness (lm-eval), an open source library for independent, reproducible, and evaluation of language models.
arXiv Detail & Related papers (2024-05-23T16:50:49Z) - Integration of cognitive tasks into artificial general intelligence test
for large models [54.72053150920186]
We advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests.
The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence.
arXiv Detail & Related papers (2024-02-04T15:50:42Z) - On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models.
We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process.
Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z) - Evidence of interrelated cognitive-like capabilities in large language models: Indications of artificial general intelligence or achievement? [0.0]
Large language models (LLMs) are advanced artificial intelligence (AI) systems that can perform a variety of tasks commonly found in human intelligence tests.
We investigated whether test scores may also exhibit positive intercorrelations.
We found strong empirical evidence for a positive manifold and a general factor of ability.
arXiv Detail & Related papers (2023-10-17T22:42:12Z) - Exploring the Cognitive Knowledge Structure of Large Language Models: An
Educational Diagnostic Assessment Approach [50.125704610228254]
Large Language Models (LLMs) have not only exhibited exceptional performance across various tasks, but also demonstrated sparks of intelligence.
Recent studies have focused on assessing their capabilities on human exams and revealed their impressive competence in different domains.
We conduct an evaluation using MoocRadar, a meticulously annotated human test dataset based on Bloom taxonomy.
arXiv Detail & Related papers (2023-10-12T09:55:45Z) - Using Artificial Populations to Study Psychological Phenomena in Neural
Models [0.0]
Investigation of cognitive behavior in language models must be conducted in an appropriate population for the results to be meaningful.
We leverage work in uncertainty estimation in a novel approach to efficiently construct experimental populations.
We provide theoretical grounding in the uncertainty estimation literature and motivation from current cognitive work regarding language models.
arXiv Detail & Related papers (2023-08-15T20:47:51Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in
Natural Language Understanding [1.827510863075184]
Curriculum is a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena.
We show that this linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.
arXiv Detail & Related papers (2022-04-13T10:32:03Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.