The Who in Code-Switching: A Case Study for Predicting Egyptian
Arabic-English Code-Switching Levels based on Character Profiles
- URL: http://arxiv.org/abs/2208.00433v1
- Date: Sun, 31 Jul 2022 13:47:35 GMT
- Title: The Who in Code-Switching: A Case Study for Predicting Egyptian
Arabic-English Code-Switching Levels based on Character Profiles
- Authors: Injy Hamed, Alia El Bolock, Cornelia Herbert, Slim Abdennadher, Ngoc
Thang Vu
- Abstract summary: Code-switching (CS) is a linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation.
We use machine learning (ML) to predict users' CS levels based on their profiles.
Our results show that the CS behaviour is affected by the relation between speakers, travel experiences as well as Neuroticism and Extraversion personality traits.
- Score: 20.746558640332953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-switching (CS) is a common linguistic phenomenon exhibited by
multilingual individuals, where they tend to alternate between languages within
one single conversation. CS is a complex phenomenon that not only encompasses
linguistic challenges, but also contains a great deal of complexity in terms of
its dynamic behaviour across speakers. Given that the factors giving rise to CS
vary from one country to the other, as well as from one person to the other, CS
is found to be a speaker-dependant behaviour, where the frequency by which the
foreign language is embedded differs across speakers. While several researchers
have looked into predicting CS behaviour from a linguistic point of view,
research is still lacking in the task of predicting user CS behaviour from
sociological and psychological perspectives. We provide an empirical user
study, where we investigate the correlations between users' CS levels and
character traits. We conduct interviews with bilinguals and gather information
on their profiles, including their demographics, personality traits, and
traveling experiences. We then use machine learning (ML) to predict users' CS
levels based on their profiles, where we identify the main influential factors
in the modeling process. We experiment with both classification as well as
regression tasks. Our results show that the CS behaviour is affected by the
relation between speakers, travel experiences as well as Neuroticism and
Extraversion personality traits.
Related papers
- On the Proper Treatment of Tokenization in Psycholinguistics [53.960910019072436]
The paper argues that token-level language models should be marginalized into character-level language models before they are used in psycholinguistic studies.
We find various focal areas whose surprisal is a better psychometric predictor than the surprisal of the region of interest itself.
arXiv Detail & Related papers (2024-10-03T17:18:03Z) - Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models [16.0617753653454]
This study presents a comparative analysis between human performance and SSL models.
We also compare the SER ability of models and humans at both utterance- and segment-levels.
Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers.
arXiv Detail & Related papers (2024-09-25T13:27:17Z) - Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability [2.672177830116334]
This study employs psycholinguistic paradigms to explore neuron-level representations in language model across three tasks.
Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality.
arXiv Detail & Related papers (2024-09-24T07:40:33Z) - Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement [6.984291346424792]
We conduct a multilingual computational analysis of non-verbal features and investigate their role in engagement prediction.
We extracted multimodal non-verbal features, including speech acoustics, facial expressions, backchanneling and gestures.
We analyzed the influence of cultural differences in the input features of LSTM models trained to predict engagement for five language datasets.
arXiv Detail & Related papers (2024-09-09T18:37:34Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Predicting User Code-Switching Level from Sociological and Psychological
Profiles [24.32063659777203]
We show the correlation between users' CS frequency and character traits.
We use machine learning (ML) to validate the findings.
The predictive models were able to predict users' CS frequency with an accuracy higher than 55%.
arXiv Detail & Related papers (2021-12-13T07:36:02Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Style Variation as a Vantage Point for Code-Switching [54.34370423151014]
Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities.
We present a novel vantage point of CS to be style variations between both the participating languages.
We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences.
arXiv Detail & Related papers (2020-05-01T15:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.