Warped Language Models for Noise Robust Language Understanding
- URL: http://arxiv.org/abs/2011.01900v1
- Date: Tue, 3 Nov 2020 18:26:28 GMT
- Title: Warped Language Models for Noise Robust Language Understanding
- Authors: Mahdi Namazifar, Gokhan Tur, Dilek Hakkani T\"ur
- Abstract summary: Masked Language Models (MLM) are self-supervised neural networks trained fill in the blanks in a given sentence with masked tokens.
We show that natural language understanding systems built on top of WLMs perform better compared to those built on conversationals.
- Score: 11.017026606760728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked Language Models (MLM) are self-supervised neural networks trained to
fill in the blanks in a given sentence with masked tokens. Despite the
tremendous success of MLMs for various text based tasks, they are not robust
for spoken language understanding, especially for spontaneous conversational
speech recognition noise. In this work we introduce Warped Language Models
(WLM) in which input sentences at training time go through the same
modifications as in MLM, plus two additional modifications, namely inserting
and dropping random tokens. These two modifications extend and contract the
sentence in addition to the modifications in MLMs, hence the word "warped" in
the name. The insertion and drop modification of the input text during training
of WLM resemble the types of noise due to Automatic Speech Recognition (ASR)
errors, and as a result WLMs are likely to be more robust to ASR noise. Through
computational results we show that natural language understanding systems built
on top of WLMs perform better compared to those built based on MLMs, especially
in the presence of ASR errors.
Related papers
- Which Syntactic Capabilities Are Statistically Learned by Masked
Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities.
To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z) - Loss Masking Is Not Needed in Decoder-only Transformer for
Discrete-token-based ASR [58.136778669618096]
unified speech-text models have achieved remarkable performance on various speech tasks.
We propose to model speech tokens in an autoregressive way, similar to text.
We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance.
arXiv Detail & Related papers (2023-11-08T08:45:14Z) - SALM: Speech-augmented Language Model with In-context Learning for
Speech Recognition and Translation [26.778332992311043]
We present a novel Speech Augmented Language Model (SALM) with em multitask and em in-context learning capabilities.
The unified SALM achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST)
arXiv Detail & Related papers (2023-10-13T22:07:33Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z) - Assessing Phrase Break of ESL Speech with Pre-trained Language Models
and Large Language Models [7.782346535009883]
This work introduces approaches to assessing phrase breaks in ESL learners' speech using pre-trained language models (PLMs) and large language models (LLMs)
arXiv Detail & Related papers (2023-06-08T07:10:39Z) - SpeechGen: Unlocking the Generative Power of Speech Language Models with
Prompts [108.04306136086807]
We present research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen.
The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs.
arXiv Detail & Related papers (2023-06-03T22:35:27Z) - How Does Pretraining Improve Discourse-Aware Translation? [41.20896077662125]
We introduce a probing task to interpret the ability of pretrained language models to capture discourse relation knowledge.
We validate three state-of-the-art PLMs across encoder-, decoder-, and encoder-decoder-based models.
Our findings are instructive to understand how and when discourse knowledge in PLMs should work for downstream tasks.
arXiv Detail & Related papers (2023-05-31T13:36:51Z) - Masked and Permuted Implicit Context Learning for Scene Text Recognition [8.742571493814326]
Scene Recognition (STR) is difficult because of variations in text styles, shapes, and backgrounds.
We propose a masked and permuted implicit context learning network for STR, within a single decoder.
arXiv Detail & Related papers (2023-05-25T15:31:02Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Fast, Effective and Self-Supervised: Transforming Masked LanguageModels
into Universal Lexical and Sentence Encoders [66.76141128555099]
We show that it is possible to turn tasks into universal lexical and sentence encoders even without any additional data and without supervision.
We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT.
Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples.
We report huge gains over off-the-shelfs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages.
arXiv Detail & Related papers (2021-04-16T10:49:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.