ADIMA: Abuse Detection In Multilingual Audio
- URL: http://arxiv.org/abs/2202.07991v1
- Date: Wed, 16 Feb 2022 11:09:50 GMT
- Title: ADIMA: Abuse Detection In Multilingual Audio
- Authors: Vikram Gupta, Rini Sharon, Ramit Sawhney, Debdoot Mukherjee
- Abstract summary: Abusive content detection in spoken text can be addressed by performing Automatic Speech Recognition (ASR) and leveraging advancements in natural language processing.
We propose ADIMA, a novel, linguistically diverse, ethically sourced, expert annotated and well-balanced multilingual profanity detection audio dataset.
- Score: 28.64185949388967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Abusive content detection in spoken text can be addressed by performing
Automatic Speech Recognition (ASR) and leveraging advancements in natural
language processing. However, ASR models introduce latency and often perform
sub-optimally for profane words as they are underrepresented in training
corpora and not spoken clearly or completely. Exploration of this problem
entirely in the audio domain has largely been limited by the lack of audio
datasets. Building on these challenges, we propose ADIMA, a novel,
linguistically diverse, ethically sourced, expert annotated and well-balanced
multilingual profanity detection audio dataset comprising of 11,775 audio
samples in 10 Indic languages spanning 65 hours and spoken by 6,446 unique
users. Through quantitative experiments across monolingual and cross-lingual
zero-shot settings, we take the first step in democratizing audio based content
moderation in Indic languages and set forth our dataset to pave future work.
Related papers
- Bridging Language Gaps in Audio-Text Retrieval [28.829775980536574]
We propose a language enhancement (LE) using a multilingual text encoder (SONAR) to encode the text data with language-specific information.
We optimize the audio encoder through the application of consistent ensemble distillation (CED), enhancing support for variable-length audio-text retrieval.
Our methodology excels in English audio-text retrieval, demonstrating state-of-the-art (SOTA) performance on commonly used datasets such as AudioCaps and Clotho.
arXiv Detail & Related papers (2024-06-11T07:12:12Z) - XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception [62.660135152900615]
Speech recognition and translation systems perform poorly on noisy inputs.
XLAVS-R is a cross-lingual audio-visual speech representation model for noise-robust speech recognition and translation.
arXiv Detail & Related papers (2024-03-21T13:52:17Z) - MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector [10.37639482435147]
We introduce MuTox, the first highly multilingual audio-based dataset with toxicity labels.
The dataset comprises 20,000 audio utterances for English and Spanish, and 4,000 for the other 19 languages.
arXiv Detail & Related papers (2024-01-10T10:37:45Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - Low-Resource Multilingual and Zero-Shot Multispeaker TTS [25.707717591185386]
We show that it is possible for a system to learn speaking a new language using just 5 minutes of training data.
We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker.
arXiv Detail & Related papers (2022-10-21T20:03:37Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - CLSRIL-23: Cross Lingual Speech Representations for Indic Languages [0.0]
CLSRIL-23 is a self supervised learning based model which learns cross lingual speech representations from raw audio across 23 Indic languages.
It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations.
We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining.
arXiv Detail & Related papers (2021-07-15T15:42:43Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus [57.641761472372814]
CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English.
It diversified with over 11,000 speakers and over 60 accents.
CoVoST is released under CC0 license and free to use.
arXiv Detail & Related papers (2020-02-04T14:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.