Related papers: The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

URL: http://arxiv.org/abs/2208.00433v1
Date: Sun, 31 Jul 2022 13:47:35 GMT
Title: The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles
Authors: Injy Hamed, Alia El Bolock, Cornelia Herbert, Slim Abdennadher, Ngoc Thang Vu
Abstract summary: Code-switching (CS) is a linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation. We use machine learning (ML) to predict users' CS levels based on their profiles. Our results show that the CS behaviour is affected by the relation between speakers, travel experiences as well as Neuroticism and Extraversion personality traits.
Score: 20.746558640332953
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code-switching (CS) is a common linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation. CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behaviour across speakers. Given that the factors giving rise to CS vary from one country to the other, as well as from one person to the other, CS is found to be a speaker-dependant behaviour, where the frequency by which the foreign language is embedded differs across speakers. While several researchers have looked into predicting CS behaviour from a linguistic point of view, research is still lacking in the task of predicting user CS behaviour from sociological and psychological perspectives. We provide an empirical user study, where we investigate the correlations between users' CS levels and character traits. We conduct interviews with bilinguals and gather information on their profiles, including their demographics, personality traits, and traveling experiences. We then use machine learning (ML) to predict users' CS levels based on their profiles, where we identify the main influential factors in the modeling process. We experiment with both classification as well as regression tasks. Our results show that the CS behaviour is affected by the relation between speakers, travel experiences as well as Neuroticism and Extraversion personality traits.

Related papers

Code-Switching and Syntax: A Large-Scale Experiment [2.100960337325026]
We show that syntax alone is sufficient for an automatic system to distinguish between sentences in minimal pairs of code-switching humans.<n>The learnt syntactic patterns generalise well to unseen language pairs.
arXiv Detail & Related papers (2025-06-02T16:32:14Z)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z)
On the Proper Treatment of Tokenization in Psycholinguistics [53.960910019072436]
The paper argues that token-level language models should be marginalized into character-level language models before they are used in psycholinguistic studies. We find various focal areas whose surprisal is a better psychometric predictor than the surprisal of the region of interest itself.
arXiv Detail & Related papers (2024-10-03T17:18:03Z)
Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models [16.0617753653454]
This study presents a comparative analysis between human performance and SSL models. We also compare the SER ability of models and humans at both utterance- and segment-levels. Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers.
arXiv Detail & Related papers (2024-09-25T13:27:17Z)
Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability [2.672177830116334]
This study employs psycholinguistic paradigms to explore neuron-level representations in language model across three tasks. Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality.
arXiv Detail & Related papers (2024-09-24T07:40:33Z)
Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement [6.984291346424792]
We conduct a multilingual computational analysis of non-verbal features and investigate their role in engagement prediction. We extracted multimodal non-verbal features, including speech acoustics, facial expressions, backchanneling and gestures. We analyzed the influence of cultural differences in the input features of LSTM models trained to predict engagement for five language datasets.
arXiv Detail & Related papers (2024-09-09T18:37:34Z)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence. We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena. As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process. We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
Predicting User Code-Switching Level from Sociological and Psychological Profiles [24.32063659777203]
We show the correlation between users' CS frequency and character traits. We use machine learning (ML) to validate the findings. The predictive models were able to predict users' CS frequency with an accuracy higher than 55%.
arXiv Detail & Related papers (2021-12-13T07:36:02Z)
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models. We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z)
Style Variation as a Vantage Point for Code-Switching [54.34370423151014]
Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities. We present a novel vantage point of CS to be style variations between both the participating languages. We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences.
arXiv Detail & Related papers (2020-05-01T15:53:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.