Multimodal Modeling For Spoken Language Identification
- URL: http://arxiv.org/abs/2309.10567v1
- Date: Tue, 19 Sep 2023 12:21:39 GMT
- Title: Multimodal Modeling For Spoken Language Identification
- Authors: Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram
Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch,
Sandy Ritchie, Partha Talukdar, Jason Riesa
- Abstract summary: Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
- Score: 57.94119986116947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spoken language identification refers to the task of automatically predicting
the spoken language in a given utterance. Conventionally, it is modeled as a
speech-based language identification task. Prior techniques have been
constrained to a single modality; however in the case of video data there is a
wealth of other metadata that may be beneficial for this task. In this work, we
propose MuSeLI, a Multimodal Spoken Language Identification method, which
delves into the use of various metadata sources to enhance language
identification. Our study reveals that metadata such as video title,
description and geographic location provide substantial information to identify
the spoken language of the multimedia recording. We conduct experiments using
two diverse public datasets of YouTube videos, and obtain state-of-the-art
results on the language identification task. We additionally conduct an
ablation study that describes the distinct contribution of each modality for
language recognition.
Related papers
- Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model [14.39119862985503]
We aim to create a multilingual ALT system with available datasets.
Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario.
We evaluate the performance of the multilingual model in comparison to its monolingual counterparts.
arXiv Detail & Related papers (2024-06-25T15:02:32Z) - Multilingual Multi-Figurative Language Detection [14.799109368073548]
figurative language understanding is highly understudied in a multilingual setting.
We introduce multilingual multi-figurative language modelling, and provide a benchmark for sentence-level figurative language detection.
We develop a framework for figurative language detection based on template-based prompt learning.
arXiv Detail & Related papers (2023-05-31T18:52:41Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Language identification as improvement for lip-based biometric visual
systems [13.205817167773443]
We present a preliminary study in which we use linguistic information as a soft biometric trait to enhance the performance of a visual (auditory-free) identification system based on lip movement.
We report a significant improvement in the identification performance of the proposed visual system as a result of the integration of these data.
arXiv Detail & Related papers (2023-02-27T15:44:24Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Multilingual Transfer Learning for Code-Switched Language and Speech
Neural Modeling [12.497781134446898]
We address the data scarcity and limitations of linguistic theory by proposing language-agnostic multi-task training methods.
First, we introduce a meta-learning-based approach, meta-transfer learning, in which information is judiciously extracted from high-resource monolingual speech data to the code-switching domain.
Second, we propose a novel multilingual meta-ems approach to effectively represent code-switching data by acquiring useful knowledge learned in other languages.
Third, we introduce multi-task learning to integrate syntactic information as a transfer learning strategy to a language model and learn where to code-switch.
arXiv Detail & Related papers (2021-04-13T14:49:26Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.