Nollywood: Let's Go to the Movies!
- URL: http://arxiv.org/abs/2407.02631v1
- Date: Tue, 2 Jul 2024 19:50:55 GMT
- Title: Nollywood: Let's Go to the Movies!
- Authors: John E. Ortega, Ibrahim Said Ahmad, William Chen,
- Abstract summary: We create a phonetic sub-title model that is able to translate Nigerian English speech to American English.
We also use the most advanced toxicity detectors to discover how toxic the speech is.
Our aim is to highlight the text in these videos which is often times ignored for lack of dialectal understanding.
- Score: 3.818480245025447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to American English and (2) use the most advanced toxicity detectors to discover how toxic the speech is. Our aim is to highlight the text in these videos which is often times ignored for lack of dialectal understanding due the fact that many people in Nigeria speak a native language like Hausa at home.
Related papers
- Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs [8.829688681748413]
Naija is a Nigerian-Pidgin spoken by approx. 120M speakers in Nigeria.
It is a mixed language (e.g., English, Portuguese, Yoruba, Hausa and Igbo)
It is hard to distinguish by non-native from a larger pidgin languages spoken across West Africa known as West African Pidgin English (WAPE)
arXiv Detail & Related papers (2024-04-30T10:45:40Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Low-Resource Counterspeech Generation for Indic Languages: The Case of
Bengali and Hindi [11.117463901375602]
We bridge the gap for low-resource languages such as Bengali and Hindi.
We create a benchmark dataset of 5,062 abusive speech/counterspeech pairs.
We observe that the monolingual setup yields the best performance.
arXiv Detail & Related papers (2024-02-11T18:09:50Z) - Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.
By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z) - Teacher Perception of Automatically Extracted Grammar Concepts for L2
Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning.
We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary).
We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z) - Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture
Videos into Multiple Indian Languages [5.17905382659474]
Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies.
This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically.
arXiv Detail & Related papers (2022-11-01T07:06:29Z) - yosm: A new yoruba sentiment corpus for movie reviews [2.3513645401551337]
We explore sentiment analysis on reviews of Nigerian movies.
The data comprised 1500 movie reviews that were sourced from IMDB, Rotten Tomatoes, Letterboxd, Cinemapointer and Nollyrated.
We develop sentiment classification models using the state-of-the-art pre-trained language models like mBERT and AfriBERTa.
arXiv Detail & Related papers (2022-04-20T18:00:37Z) - Unsupervised Transfer Learning in Multilingual Neural Machine
Translation with Cross-Lingual Word Embeddings [72.69253034282035]
We exploit a language independent multilingual sentence representation to easily generalize to a new language.
Blindly decoding from Portuguese using a basesystem containing several Romance languages we achieve scores of 36.4 BLEU for Portuguese-English and 12.8 BLEU for Russian-English.
We explore a more practical adaptation approach through non-iterative backtranslation, exploiting our model's ability to produce high quality translations.
arXiv Detail & Related papers (2021-03-11T14:22:08Z) - Gender Bias, Social Bias and Representation: 70 Years of B$^H$ollywood [32.340056383090044]
No comprehensive NLP study on the evolution of social and gender biases in Bollywood dialogues exists.
We seek to understand the portrayal of women, in a broader context studying subtle social signals.
Our argument is simple -- popular movie content reflects social norms and beliefs in some form or shape.
arXiv Detail & Related papers (2021-02-18T01:27:24Z) - Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking
Head Generation Using Phonetic Posteriorgrams [58.617181880383605]
In this work, we propose a novel approach using phonetic posteriorgrams.
Our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches.
Our model is the first to support multilingual/mixlingual speech as input with convincing results.
arXiv Detail & Related papers (2020-06-20T16:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.