CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
- URL: http://arxiv.org/abs/2405.10212v2
- Date: Sat, 18 May 2024 07:55:58 GMT
- Title: CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
- Authors: Jiahao Zhao, Jingwei Zhu, Minghuan Tan, Min Yang, Di Yang, Chenhao Zhang, Guancheng Ye, Chengming Li, Xiping Hu,
- Abstract summary: CPsyExam is designed to prioritize psychological knowledge and case analysis separately.
From the pool of 22k questions, we utilize 4k to create the benchmark.
- Score: 28.097820924530655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.
Related papers
- PsychoLex: Unveiling the Psychological Mind of Large Language Models [1.3518297878940662]
This paper explores the intersection of psychology and artificial intelligence through the development and evaluation of specialized Large Language Models (LLMs)
PsychoLex is a suite of resources designed to enhance LLMs' proficiency in psychological tasks in both Persian and English.
We present the PsychoLexLLaMA model, optimized specifically for psychological applications, demonstrating superior performance compared to general-purpose models.
arXiv Detail & Related papers (2024-08-16T17:19:23Z) - PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation [27.575675130769437]
We propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset.
We construct multi-turn dialogues through a three-step pipeline comprising generation, evidence judgment, and refinement.
To compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China.
arXiv Detail & Related papers (2024-07-08T08:25:56Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Surveying the Dead Minds: Historical-Psychological Text Analysis with
Contextualized Construct Representation (CCR) for Classical Chinese [4.772998830872483]
We develop a pipeline for historical-psychological text analysis in classical Chinese.
The pipeline combines expert knowledge in psychometrics with text representations generated via transformer-based language models.
Considering the scarcity of available data, we propose an indirect supervised contrastive learning approach.
arXiv Detail & Related papers (2024-03-01T13:14:45Z) - Can Large Language Models Understand Context? [17.196362853457412]
This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models.
Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models.
As LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings.
arXiv Detail & Related papers (2024-02-01T18:55:29Z) - ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology [25.845704502964143]
ConceptPsy is designed to evaluate Chinese complex reasoning and knowledge abilities in psychology.
This paper presents ConceptPsy, designed to evaluate Chinese complex reasoning and knowledge abilities in psychology.
arXiv Detail & Related papers (2023-11-16T12:43:18Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Sentiment Analysis Based on Deep Learning: A Comparative Study [69.09570726777817]
The study of public opinion can provide us with valuable information.
The efficiency and accuracy of sentiment analysis is being hindered by the challenges encountered in natural language processing.
This paper reviews the latest studies that have employed deep learning to solve sentiment analysis problems.
arXiv Detail & Related papers (2020-06-05T16:28:10Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.