Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in
Italian
- URL: http://arxiv.org/abs/2011.05688v1
- Date: Wed, 11 Nov 2020 10:51:07 GMT
- Title: Personal-ITY: A Novel YouTube-based Corpus for Personality Prediction in
Italian
- Authors: Elisa Bassignana, Malvina Nissim and Viviana Patti
- Abstract summary: We present a novel corpus for personality prediction in Italian, containing a larger number of authors and a different genre compared to previously available resources.
The corpus is built exploiting Distant Supervision, assigning Myers-Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend itself to a variety of experiments.
We report on preliminary experiments on Personal-ITY, which can serve as a baseline for future work, showing that some types are easier to predict than others, and discussing the perks of cross-dataset prediction.
- Score: 11.38723572165938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel corpus for personality prediction in Italian, containing a
larger number of authors and a different genre compared to previously available
resources. The corpus is built exploiting Distant Supervision, assigning
Myers-Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend
itself to a variety of experiments. We report on preliminary experiments on
Personal-ITY, which can serve as a baseline for future work, showing that some
types are easier to predict than others, and discussing the perks of
cross-dataset prediction.
Related papers
- Reddit is all you need: Authorship profiling for Romanian [49.1574468325115]
Authorship profiling is the process of identifying an author's characteristics based on their writings.
In this paper, we introduce a corpus of short texts in the Romanian language, annotated with certain author characteristic keywords.
arXiv Detail & Related papers (2024-10-13T16:27:31Z) - Is Personality Prediction Possible Based on Reddit Comments? [0.0]
In this assignment, we examine whether there is a correlation between the personality type of a person and the texts they wrote.
In order to do this, we aggregated datasets of Reddit comments labeled with the Myers-Briggs Type Indicator (MBTI) of the author and built different supervised classifiers based on BERT to try to predict the personality of an author given a text.
arXiv Detail & Related papers (2024-08-28T18:43:07Z) - Humans and language models diverge when predicting repeating text [52.03471802608112]
We present a scenario in which the performance of humans and LMs diverges.
Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory begins to play a role.
We hope that this scenario will spur future work in bringing LMs closer to human behavior.
arXiv Detail & Related papers (2023-10-10T08:24:28Z) - Testing the Predictions of Surprisal Theory in 11 Languages [77.45204595614]
We investigate the relationship between surprisal and reading times in eleven different languages.
By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.
arXiv Detail & Related papers (2023-07-07T15:37:50Z) - Myers-Briggs personality classification from social media text using
pre-trained language models [0.0]
We describe a series of experiments in which the well-known Bidirectional Representations from Transformers (BERT) model is fine-tuned to perform MBTI classification.
Our main findings suggest that the current approach significantly outperforms well-known text classification models based on bag-of-words and static word embeddings alike.
arXiv Detail & Related papers (2022-07-10T14:38:09Z) - Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the
Research Manifold [88.83876819883653]
We show through a manual classification of recent NLP research papers that this is indeed the case.
We observe that NLP research often goes beyond the square one setup, focusing not only on accuracy, but also on fairness or interpretability, but typically only along a single dimension.
arXiv Detail & Related papers (2022-06-20T13:04:23Z) - Exploring Personality and Online Social Engagement: An Investigation of
MBTI Users on Twitter [0.0]
We investigate 3848 profiles from Twitter with self-labeled Myers-Briggs personality traits (MBTI)
We leverage BERT, a state-of-the-art NLP architecture based on deep learning, to analyze various sources of text that hold most predictive power for our task.
We find that biographies, statuses, and liked tweets contain significant predictive power for all dimensions of the MBTI system.
arXiv Detail & Related papers (2021-09-14T02:26:30Z) - DOCENT: Learning Self-Supervised Entity Representations from Large
Document Collections [18.62873757515885]
This paper explores learning rich self-supervised entity representations from large amounts of associated text.
Once pre-trained, these models become applicable to multiple entity-centric tasks such as ranked retrieval, knowledge base completion, question answering, and more.
We present several training strategies that, unlike prior approaches, learn to jointly predict words and entities.
arXiv Detail & Related papers (2021-02-26T01:00:12Z) - Matching Theory and Data with Personal-ITY: What a Corpus of Italian
YouTube Comments Reveals About Personality [11.38723572165938]
We create a novel corpus of YouTube comments in Italian, where authors are labelled with personality traits.
The traits are derived from one of the mainstream personality theories in psychology research, named MBTI.
We study the task of personality prediction in itself on our corpus as well as on TwiSty.
arXiv Detail & Related papers (2020-11-11T12:45:33Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.