Warped Language Models for Noise Robust Language Understanding
- URL: http://arxiv.org/abs/2011.01900v1
- Date: Tue, 3 Nov 2020 18:26:28 GMT
- Title: Warped Language Models for Noise Robust Language Understanding
- Authors: Mahdi Namazifar, Gokhan Tur, Dilek Hakkani T\"ur
- Abstract summary: Masked Language Models (MLM) are self-supervised neural networks trained fill in the blanks in a given sentence with masked tokens.
We show that natural language understanding systems built on top of WLMs perform better compared to those built on conversationals.
- Score: 11.017026606760728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked Language Models (MLM) are self-supervised neural networks trained to
fill in the blanks in a given sentence with masked tokens. Despite the
tremendous success of MLMs for various text based tasks, they are not robust
for spoken language understanding, especially for spontaneous conversational
speech recognition noise. In this work we introduce Warped Language Models
(WLM) in which input sentences at training time go through the same
modifications as in MLM, plus two additional modifications, namely inserting
and dropping random tokens. These two modifications extend and contract the
sentence in addition to the modifications in MLMs, hence the word "warped" in
the name. The insertion and drop modification of the input text during training
of WLM resemble the types of noise due to Automatic Speech Recognition (ASR)
errors, and as a result WLMs are likely to be more robust to ASR noise. Through
computational results we show that natural language understanding systems built
on top of WLMs perform better compared to those built based on MLMs, especially
in the presence of ASR errors.
Related papers
- DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models [45.791472119671916]
Spoken language models (SLMs) process text and speech, enabling simultaneous speech understanding and generation.
DC-Spin aims to improve speech tokenization by bridging audio signals and SLM tokens.
We propose a chunk-wise approach to enable streamable DC-Spin without retraining and degradation.
arXiv Detail & Related papers (2024-10-31T17:43:13Z) - Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions [68.98811048970963]
We present a pioneering effort to investigate the capability of large language models (LLMs) in transcribing speech in multi-talker environments.
Our approach utilizes WavLM and Whisper encoder to extract multi-faceted speech representations that are sensitive to speaker characteristics and semantic context.
Comprehensive experiments reveal the promising performance of our proposed system, MT-LLM, in cocktail party scenarios.
arXiv Detail & Related papers (2024-09-13T07:28:28Z) - SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing.
We reformulate speech processing tasks into speech-to-unit generation tasks.
We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv Detail & Related papers (2024-08-23T13:00:10Z) - Which Syntactic Capabilities Are Statistically Learned by Masked
Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities.
To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z) - Assessing Phrase Break of ESL Speech with Pre-trained Language Models
and Large Language Models [7.782346535009883]
This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs)
arXiv Detail & Related papers (2023-06-08T07:10:39Z) - SpeechGen: Unlocking the Generative Power of Speech Language Models with
Prompts [108.04306136086807]
We present research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen.
The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs.
arXiv Detail & Related papers (2023-06-03T22:35:27Z) - How Does Pretraining Improve Discourse-Aware Translation? [41.20896077662125]
We introduce a probing task to interpret the ability of pretrained language models to capture discourse relation knowledge.
We validate three state-of-the-art PLMs across encoder-, decoder-, and encoder-decoder-based models.
Our findings are instructive to understand how and when discourse knowledge in PLMs should work for downstream tasks.
arXiv Detail & Related papers (2023-05-31T13:36:51Z) - Masked and Permuted Implicit Context Learning for Scene Text Recognition [8.742571493814326]
Scene Recognition (STR) is difficult because of variations in text styles, shapes, and backgrounds.
We propose a masked and permuted implicit context learning network for STR, within a single decoder.
arXiv Detail & Related papers (2023-05-25T15:31:02Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.