Surveying the Dead Minds: Historical-Psychological Text Analysis with
Contextualized Construct Representation (CCR) for Classical Chinese
- URL: http://arxiv.org/abs/2403.00509v1
- Date: Fri, 1 Mar 2024 13:14:45 GMT
- Title: Surveying the Dead Minds: Historical-Psychological Text Analysis with
Contextualized Construct Representation (CCR) for Classical Chinese
- Authors: Yuqi Chen, Sixuan Li, Ying Li and Mohammad Atari
- Abstract summary: We develop a pipeline for historical-psychological text analysis in classical Chinese.
The pipeline combines expert knowledge in psychometrics with text representations generated via transformer-based language models.
Considering the scarcity of available data, we propose an indirect supervised contrastive learning approach.
- Score: 4.772998830872483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we develop a pipeline for historical-psychological text
analysis in classical Chinese. Humans have produced texts in various languages
for thousands of years; however, most of the computational literature is
focused on contemporary languages and corpora. The emerging field of historical
psychology relies on computational techniques to extract aspects of psychology
from historical corpora using new methods developed in natural language
processing (NLP). The present pipeline, called Contextualized Construct
Representations (CCR), combines expert knowledge in psychometrics (i.e.,
psychological surveys) with text representations generated via
transformer-based language models to measure psychological constructs such as
traditionalism, norm strength, and collectivism in classical Chinese corpora.
Considering the scarcity of available data, we propose an indirect supervised
contrastive learning approach and build the first Chinese historical psychology
corpus (C-HI-PSY) to fine-tune pre-trained models. We evaluate the pipeline to
demonstrate its superior performance compared with other approaches. The CCR
method outperforms word-embedding-based approaches across all of our tasks and
exceeds prompting with GPT-4 in most tasks. Finally, we benchmark the pipeline
against objective, external data to further verify its validity.
Related papers
- CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations [28.097820924530655]
CPsyExam is designed to prioritize psychological knowledge and case analysis separately.
From the pool of 22k questions, we utilize 4k to create the benchmark.
arXiv Detail & Related papers (2024-05-16T16:02:18Z) - A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing [2.7038841665524846]
The pretrain-finetune paradigm represents a transformative approach in text analysis and natural language processing.
This tutorial offers a comprehensive introduction to the pretrain-finetune paradigm.
arXiv Detail & Related papers (2024-03-04T21:51:11Z) - GujiBERT and GujiGPT: Construction of Intelligent Information Processing
Foundation Language Models for Ancient Texts [11.289265479095956]
GujiBERT and GujiGPT language models are foundational models specifically designed for intelligent information processing of ancient texts.
These models have been trained on an extensive dataset that encompasses both simplified and traditional Chinese characters.
These models have exhibited exceptional performance across a range of validation tasks using publicly available datasets.
arXiv Detail & Related papers (2023-07-11T15:44:01Z) - A Survey of Text Representation Methods and Their Genealogy [0.0]
In recent years, with the advent of highly scalable artificial-neural-network-based text representation methods the field of natural language processing has seen unprecedented growth and sophistication.
We provide a survey of current approaches, by arranging them in a genealogy, and by conceptualizing a taxonomy of text representation methods to examine and explain the state-of-the-art.
arXiv Detail & Related papers (2022-11-26T15:22:01Z) - Concepts and Experiments on Psychoanalysis Driven Computing [0.0]
This research investigates the effective incorporation of the human factor and user perception in text-based interactive media.
We use the notion of Lacanian discourse types to capture and deeply understand real characteristics, qualities and contents of texts.
This is the first time computational methods are systematically combined with psychoanalysis.
arXiv Detail & Related papers (2022-09-29T19:27:22Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - Interactive Fiction Game Playing as Multi-Paragraph Reading
Comprehension with Reinforcement Learning [94.50608198582636]
Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques.
We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading (MPRC) tasks.
arXiv Detail & Related papers (2020-10-05T23:09:20Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.