Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence
- URL: http://arxiv.org/abs/2503.09853v2
- Date: Fri, 14 Mar 2025 23:59:45 GMT
- Title: Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence
- Authors: Kourosh Shahnazari, Seyed Moein Ayyoubzadeh,
- Abstract summary: This work investigates implicit categorization, inferring personality and gender variables directly from linguistic patterns in Telegram conversation data.<n>We refine a Transformer-based language model (RoBERTa) to capture complex linguistic cues indicative of personality traits and gender differences.<n>Confidence levels help to greatly raise model accuracy to 86.16%, hence proving RoBERTa's capacity to consistently identify implicit personality types from conversational text data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In personalized technology and psychological research, precisely detecting demographic features and personality traits from digital interactions becomes ever more important. This work investigates implicit categorization, inferring personality and gender variables directly from linguistic patterns in Telegram conversation data, while conventional personality prediction techniques mostly depend on explicitly self-reported labels. We refine a Transformer-based language model (RoBERTa) to capture complex linguistic cues indicative of personality traits and gender differences using a dataset comprising 138,866 messages from 1,602 users annotated with MBTI types and 195,016 messages from 2,598 users annotated with gender. Confidence levels help to greatly raise model accuracy to 86.16\%, hence proving RoBERTa's capacity to consistently identify implicit personality types from conversational text data. Our results highlight the usefulness of Transformer topologies for implicit personality and gender classification, hence stressing their efficiency and stressing important trade-offs between accuracy and coverage in realistic conversational environments. With regard to gender classification, the model obtained an accuracy of 74.4\%, therefore capturing gender-specific language patterns. Personality dimension analysis showed that people with introverted and intuitive preferences are especially more active in text-based interactions. This study emphasizes practical issues in balancing accuracy and data coverage as Transformer-based models show their efficiency in implicit personality and gender prediction tasks from conversational texts.
Related papers
- Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets [17.101242741559428]
This paper focuses on intrinsic bias mitigation and measurement strategies for language models.<n>We delve deeper into intrinsic measurements, identifying inconsistencies and suggesting that these benchmarks may reflect different facets of gender stereotype.<n>Our findings underscore the complexity of gender stereotyping in language models and point to new directions for developing more refined techniques to detect and reduce bias.
arXiv Detail & Related papers (2025-01-02T09:40:31Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Large Language Models Can Infer Personality from Free-Form User Interactions [0.0]
GPT-4 can infer personality with moderate accuracy, outperforming previous approaches.
Results show that the direct focus on personality assessment did not result in a less positive user experience.
Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups.
arXiv Detail & Related papers (2024-05-19T20:33:36Z) - LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced
Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts.
Most existing methods learn post features directly by fine-tuning the pre-trained language models.
We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z) - Can ChatGPT Read Who You Are? [10.577227353680994]
We report the results of a comprehensive user study featuring texts written in Czech by a representative population sample of 155 participants.
We compare the personality trait estimations made by ChatGPT against those by human raters and report ChatGPT's competitive performance in inferring personality traits from text.
arXiv Detail & Related papers (2023-12-26T14:43:04Z) - Personality Style Recognition via Machine Learning: Identifying
Anaclitic and Introjective Personality Styles from Patients' Speech [6.3042597209752715]
We use natural language processing (NLP) and machine learning tools for classification.
We test this on a dataset of recorded clinical diagnostic interviews (CDI) on a sample of 79 patients diagnosed with major depressive disorder (MDD)
We find that automated classification with language-derived features (i.e., based on LIWC) significantly outperforms questionnaire-based classification models.
arXiv Detail & Related papers (2023-11-07T15:56:19Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.