Automatic Identification of Motivation for Code-Switching in Speech
Transcripts
- URL: http://arxiv.org/abs/2212.08565v1
- Date: Wed, 30 Nov 2022 05:45:05 GMT
- Title: Automatic Identification of Motivation for Code-Switching in Speech
Transcripts
- Authors: Ritu Belani and Jeffrey Flanigan
- Abstract summary: Code-switching, or switching between languages, occurs for many reasons and has important linguistic, sociological, and cultural implications.
We build the first system to automatically identify a wide range of motivations that speakers code-switch in everyday speech.
We show that the system can be adapted to new language pairs, achieving 66% accuracy on a new language pair (Hindi-English)
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-switching, or switching between languages, occurs for many reasons and
has important linguistic, sociological, and cultural implications. Multilingual
speakers code-switch for a variety of purposes, such as expressing emotions,
borrowing terms, making jokes, introducing a new topic, etc. The reason for
code-switching may be quite useful for analysis, but is not readily apparent.
To remedy this situation, we annotate a new dataset of motivations for
code-switching in Spanish-English. We build the first system (to our knowledge)
to automatically identify a wide range of motivations that speakers code-switch
in everyday speech, achieving an accuracy of 75% across all motivations.
Additionally, we show that the system can be adapted to new language pairs,
achieving 66% accuracy on a new language pair (Hindi-English), demonstrating
the cross-lingual applicability of our annotation scheme
Related papers
- Code-switching in text and speech reveals information-theoretic audience design [5.3329709073809095]
We use language modeling to investigate the factors that influence code-switching.
Code-switching occurs when a speaker alternates between one language variety (the primary language) and another (the secondary language)
arXiv Detail & Related papers (2024-08-08T17:14:12Z) - Language Agnostic Code Embeddings [61.84835551549612]
We focus on the cross-lingual capabilities of code embeddings across different programming languages.
Code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details.
We show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks.
arXiv Detail & Related papers (2023-10-25T17:34:52Z) - Simple yet Effective Code-Switching Language Identification with
Multitask Pre-Training and Transfer Learning [0.7242530499990028]
Code-switching is the linguistics phenomenon where in casual settings, multilingual speakers mix words from different languages in one utterance.
We propose two novel approaches toward improving language identification accuracy on an English-Mandarin child-directed speech dataset.
Our best model achieves a balanced accuracy of 0.781 on a real English-Mandarin code-switching child-directed speech corpus and outperforms the previous baseline by 55.3%.
arXiv Detail & Related papers (2023-05-31T11:43:16Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Transformer-Transducers for Code-Switched Speech Recognition [23.281314397784346]
We present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.
First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching.
Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching.
arXiv Detail & Related papers (2020-11-30T17:27:41Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models.
We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.