Related papers: Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models

Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models

URL: http://arxiv.org/abs/2503.10298v2
Date: Fri, 14 Mar 2025 06:24:05 GMT
Title: Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models
Authors: Sebastian Möller, Pia Knoeferle, Britta Schulte, Nils Feldhus,
Abstract summary: Modern techniques rely on large models for representing general knowledge of one or several languages.<n>When humans interact with such technologies, the effectiveness of the interaction will be influenced by how far humans make use of the same type of language.
Score: 11.46358189300007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning techniques have conquered many different tasks in speech and natural language processing, such as speech recognition, information extraction, text and speech generation, and human machine interaction using natural language or speech (chatbots). Modern techniques typically rely on large models for representing general knowledge of one or several languages (Large Language Models, LLMs), or for representing speech and general audio characteristics. These models have been trained with large amounts of speech and language data, typically including web content. When humans interact with such technologies, the effectiveness of the interaction will be influenced by how far humans make use of the same type of language the models have been trained on or, in other words, if the models are able to generalize to the language used by humans when interacting with the technology. This may lead to some gradual forms of adaptation in human speech and language production, and users who do not adapt may be excluded from efficient use of such technologies. On top of this, as commercial model development follows market needs, under-represented languages and dialects/sociolects may decrease in terms of priorities. Furthermore, for many lesser spoken languages the necessary data is not available, which will worsen a digital divide in speech and language technology usage. The workshop sets out to discuss this problem based on scientific contributions from the perspective of computer science and linguistics (including computational linguistics and NLP).

Related papers

GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness [43.67571101152883]
We introduce GOAT-SLM, a novel spoken language model with paralinguistic and speaker characteristic awareness.<n> GOAT-SLM adopts a dual-modality head architecture that decouples linguistic modeling from acoustic realization.<n>We show that GOAT-SLM well-balanced performance across both semantic and non-semantic tasks, and outperforms existing open-source models in handling emotion, dialectal variation, and age-sensitive interactions.
arXiv Detail & Related papers (2025-07-24T06:10:29Z)
Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application [17.367710635990083]
We focus on natural language processing (NLP) and the role of large language models (LLMs) This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models. It highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness.
arXiv Detail & Related papers (2024-10-30T09:35:35Z)
We're Calling an Intervention: Exploring the Fundamental Hurdles in Adapting Language Models to Nonstandard Text [8.956635443376527]
We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text. We do so by designing interventions that approximate several types of linguistic variation and their interactions with existing biases of language models. Applying our interventions during language model adaptation with varying size and nature of training data, we gain important insights into when knowledge transfer can be successful.
arXiv Detail & Related papers (2024-04-10T18:56:53Z)
Learning and communication pressures in neural networks: Lessons from emergent communication [5.371337604556311]
We look at three cases where mismatches between the emergent linguistic behavior of neural agents and humans were resolved.<n>We identify key pressures at play for language learning and emergence: communicative success, production effort, learnability, and other psycho-/sociolinguistic factors.
arXiv Detail & Related papers (2024-03-21T14:33:34Z)
AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models. It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z)
Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task. Main ingredients are a new dataset based on readings of publicly available religious texts. We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language. We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z)
What Artificial Neural Networks Can Tell Us About Human Language Acquisition [47.761188531404066]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language. To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans.
arXiv Detail & Related papers (2022-08-17T00:12:37Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.