Fine-grained Language Identification with Multilingual CapsNet Model
- URL: http://arxiv.org/abs/2007.06078v1
- Date: Sun, 12 Jul 2020 20:01:22 GMT
- Title: Fine-grained Language Identification with Multilingual CapsNet Model
- Authors: Mudit Verma, Arun Balaji Buduru
- Abstract summary: There is an explosion of multilingual content generation and consumption.
There is an increasing need for real-time and fine-grained content analysis services.
Current techniques in spoken language detection may lack on one of these fronts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to a drastic improvement in the quality of internet services worldwide,
there is an explosion of multilingual content generation and consumption. This
is especially prevalent in countries with large multilingual audience, who are
increasingly consuming media outside their linguistic familiarity/preference.
Hence, there is an increasing need for real-time and fine-grained content
analysis services, including language identification, content transcription,
and analysis. Accurate and fine-grained spoken language detection is an
essential first step for all the subsequent content analysis algorithms.
Current techniques in spoken language detection may lack on one of these
fronts: accuracy, fine-grained detection, data requirements, manual effort in
data collection \& pre-processing. Hence in this work, a real-time language
detection approach to detect spoken language from 5 seconds' audio clips with
an accuracy of 91.8\% is presented with exiguous data requirements and minimal
pre-processing. Novel architectures for Capsule Networks is proposed which
operates on spectrogram images of the provided audio snippets. We use previous
approaches based on Recurrent Neural Networks and iVectors to present the
results. Finally we show a ``Non-Class'' analysis to further stress on why
CapsNet architecture works for LID task.
Related papers
- Bridging Language Gaps in Audio-Text Retrieval [28.829775980536574]
We propose a language enhancement (LE) using a multilingual text encoder (SONAR) to encode the text data with language-specific information.
We optimize the audio encoder through the application of consistent ensemble distillation (CED), enhancing support for variable-length audio-text retrieval.
Our methodology excels in English audio-text retrieval, demonstrating state-of-the-art (SOTA) performance on commonly used datasets such as AudioCaps and Clotho.
arXiv Detail & Related papers (2024-06-11T07:12:12Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Separate What You Describe: Language-Queried Audio Source Separation [53.65665794338574]
We introduce the task of language-queried audio source separation (LASS)
LASS aims to separate a target source from an audio mixture based on a natural language query of the target source.
We propose LASS-Net, an end-to-end neural network that is learned to jointly process acoustic and linguistic information.
arXiv Detail & Related papers (2022-03-28T23:47:57Z) - ADIMA: Abuse Detection In Multilingual Audio [28.64185949388967]
Abusive content detection in spoken text can be addressed by performing Automatic Speech Recognition (ASR) and leveraging advancements in natural language processing.
We propose ADIMA, a novel, linguistically diverse, ethically sourced, expert annotated and well-balanced multilingual profanity detection audio dataset.
arXiv Detail & Related papers (2022-02-16T11:09:50Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Transferring Knowledge Distillation for Multilingual Social Event
Detection [42.663309895263666]
Recently published graph neural networks (GNNs) show promising performance at social event detection tasks.
We present a GNN that incorporates cross-lingual word embeddings for detecting events in multilingual data streams.
Experiments on both synthetic and real-world datasets show the framework to be highly effective at detection in both multilingual data and in languages where training samples are scarce.
arXiv Detail & Related papers (2021-08-06T12:38:42Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.