The Who in Code-Switching: A Case Study for Predicting Egyptian
Arabic-English Code-Switching Levels based on Character Profiles
- URL: http://arxiv.org/abs/2208.00433v1
- Date: Sun, 31 Jul 2022 13:47:35 GMT
- Title: The Who in Code-Switching: A Case Study for Predicting Egyptian
Arabic-English Code-Switching Levels based on Character Profiles
- Authors: Injy Hamed, Alia El Bolock, Cornelia Herbert, Slim Abdennadher, Ngoc
Thang Vu
- Abstract summary: Code-switching (CS) is a linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation.
We use machine learning (ML) to predict users' CS levels based on their profiles.
Our results show that the CS behaviour is affected by the relation between speakers, travel experiences as well as Neuroticism and Extraversion personality traits.
- Score: 20.746558640332953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-switching (CS) is a common linguistic phenomenon exhibited by
multilingual individuals, where they tend to alternate between languages within
one single conversation. CS is a complex phenomenon that not only encompasses
linguistic challenges, but also contains a great deal of complexity in terms of
its dynamic behaviour across speakers. Given that the factors giving rise to CS
vary from one country to the other, as well as from one person to the other, CS
is found to be a speaker-dependant behaviour, where the frequency by which the
foreign language is embedded differs across speakers. While several researchers
have looked into predicting CS behaviour from a linguistic point of view,
research is still lacking in the task of predicting user CS behaviour from
sociological and psychological perspectives. We provide an empirical user
study, where we investigate the correlations between users' CS levels and
character traits. We conduct interviews with bilinguals and gather information
on their profiles, including their demographics, personality traits, and
traveling experiences. We then use machine learning (ML) to predict users' CS
levels based on their profiles, where we identify the main influential factors
in the modeling process. We experiment with both classification as well as
regression tasks. Our results show that the CS behaviour is affected by the
relation between speakers, travel experiences as well as Neuroticism and
Extraversion personality traits.
Related papers
- Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [73.0661307151716]
We investigate how neuron activation is shared across languages by categorizing neurons into four distinct groups according to their responses across different languages for a particular input: all-shared, partial-shared, specific, and non-activated.
Our analysis reveals the following insights: (i) the linguistic sharing patterns are strongly affected by the type of task, but neuron behaviour changes across different inputs even for the same task; (ii) all-shared neurons play a key role in generating correct responses; (iii) boosting multilingual alignment by increasing all-shared neurons can enhance accuracy on multilingual tasks.
arXiv Detail & Related papers (2024-06-13T16:04:11Z) - Personality Style Recognition via Machine Learning: Identifying
Anaclitic and Introjective Personality Styles from Patients' Speech [6.3042597209752715]
We use natural language processing (NLP) and machine learning tools for classification.
We test this on a dataset of recorded clinical diagnostic interviews (CDI) on a sample of 79 patients diagnosed with major depressive disorder (MDD)
We find that automated classification with language-derived features (i.e., based on LIWC) significantly outperforms questionnaire-based classification models.
arXiv Detail & Related papers (2023-11-07T15:56:19Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - A Survey of Code-switching: Linguistic and Social Perspectives for
Language Technologies [8.202739294785086]
We offer a survey of code-switching (C-S) covering the literature in linguistics with a reflection on the key issues in language technologies.
From the linguistic perspective, we provide an overview of structural and functional patterns of C-S focusing on the literature from European and Indian contexts.
From the language technologies perspective, we discuss how massive language models fail to represent diverse C-S types due to lack of appropriate training data.
arXiv Detail & Related papers (2023-01-05T09:08:04Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Predicting User Code-Switching Level from Sociological and Psychological
Profiles [24.32063659777203]
We show the correlation between users' CS frequency and character traits.
We use machine learning (ML) to validate the findings.
The predictive models were able to predict users' CS frequency with an accuracy higher than 55%.
arXiv Detail & Related papers (2021-12-13T07:36:02Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Style Variation as a Vantage Point for Code-Switching [54.34370423151014]
Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities.
We present a novel vantage point of CS to be style variations between both the participating languages.
We propose a two-stage generative adversarial training approach where the first stage generates competitive negative examples for CS and the second stage generates more realistic CS sentences.
arXiv Detail & Related papers (2020-05-01T15:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.