An Overview of Indian Spoken Language Recognition from Machine Learning
Perspective
- URL: http://arxiv.org/abs/2212.03812v1
- Date: Wed, 30 Nov 2022 11:03:51 GMT
- Title: An Overview of Indian Spoken Language Recognition from Machine Learning
Perspective
- Authors: Spandan Dey, Md Sahidullah, Goutam Saha
- Abstract summary: This work is one of the first attempts to present a comprehensive review of the Indian spoken language recognition research field.
In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts.
- Score: 7.27448284043116
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automatic spoken language identification (LID) is a very important research
field in the era of multilingual voice-command-based human-computer interaction
(HCI). A front-end LID module helps to improve the performance of many
speech-based applications in the multilingual scenario. India is a populous
country with diverse cultures and languages. The majority of the Indian
population needs to use their respective native languages for verbal
interaction with machines. Therefore, the development of efficient Indian
spoken language recognition systems is useful for adapting smart technologies
in every section of Indian society. The field of Indian LID has started gaining
momentum in the last two decades, mainly due to the development of several
standard multilingual speech corpora for the Indian languages. Even though
significant research progress has already been made in this field, to the best
of our knowledge, there are not many attempts to analytically review them
collectively. In this work, we have conducted one of the very first attempts to
present a comprehensive review of the Indian spoken language recognition
research field. In-depth analysis has been presented to emphasize the unique
challenges of low-resource and mutual influences for developing LID systems in
the Indian contexts. Several essential aspects of the Indian LID research, such
as the detailed description of the available speech corpora, the major research
contributions, including the earlier attempts based on statistical modeling to
the recent approaches based on different neural network architectures, and the
future research trends are discussed. This review work will help assess the
state of the present Indian LID research by any active researcher or any
research enthusiasts from related fields.
Related papers
- From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language [41.272055304311905]
This paper focuses on the resource-constrained Urdu language, which is widely spoken across South Asian nations.
It outlines current research trends, technological advancements, and potential directions for future studies in Urdu ASR.
arXiv Detail & Related papers (2024-11-20T17:39:56Z) - How Do Multilingual Models Remember? Investigating Multilingual Factual Recall Mechanisms [50.13632788453612]
Large Language Models (LLMs) store and retrieve vast amounts of factual knowledge acquired during pre-training.
The question of how these processes generalize to other languages and multilingual LLMs remains unexplored.
We examine when language plays a role in the recall process, uncovering evidence of language-independent and language-dependent mechanisms.
arXiv Detail & Related papers (2024-10-18T11:39:34Z) - LAHAJA: A Robust Multi-accent Benchmark for Evaluating Hindi ASR Systems [16.143694951047024]
We create a benchmark, LAHAJA, which contains read and extempore speech on a diverse set of topics and use cases.
We evaluate existing open-source and commercial models on LAHAJA and find their performance to be poor.
We train models using different datasets and find that our model trained on multilingual data with good speaker diversity outperforms existing models by a significant margin.
arXiv Detail & Related papers (2024-08-21T08:51:00Z) - Decoding the Diversity: A Review of the Indic AI Research Landscape [0.7864304771129751]
Indic languages are those spoken in the Indian subcontinent, including India, Pakistan, Bangladesh, Sri Lanka, Nepal, and Bhutan.
This review paper provides a comprehensive overview of large language model (LLM) research directions within Indic languages.
arXiv Detail & Related papers (2024-06-13T19:55:20Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages [6.7638050195383075]
We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian languages.
We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families.
We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the usability of our model.
arXiv Detail & Related papers (2024-05-08T05:54:54Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Exploiting Spectral Augmentation for Code-Switched Spoken Language
Identification [2.064612766965483]
We perform spoken LID on three Indian languages code-mixed with English.
This task was organized by the Microsoft research team as a spoken LID challenge.
arXiv Detail & Related papers (2020-10-14T14:37:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.