Related papers: BERT-LID: Leveraging BERT to Improve Spoken Language Identification

BERT-LID: Leveraging BERT to Improve Spoken Language Identification

URL: http://arxiv.org/abs/2203.00328v1
Date: Tue, 1 Mar 2022 10:01:25 GMT
Title: BERT-LID: Leveraging BERT to Improve Spoken Language Identification
Authors: Yuting Nie, Junhong Zhao, Wei-Qiang Zhang, Jinfeng Bai, Zhongqin Wu
Abstract summary: Language identification is a task of automatically determining the identity of a language conveyed by a spoken segment. Despite language identification attaining high accuracy on medium or long utterances, the performance on short utterances is still far from satisfactory. We propose an effective BERT-based language identification system (BERT-LID) to improve language identification performance.
Score: 12.179375898668614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language identification is a task of automatically determining the identity of a language conveyed by a spoken segment. It has a profound impact on the multilingual interoperability of an intelligent speech system. Despite language identification attaining high accuracy on medium or long utterances (>3s), the performance on short utterances (<=1s) is still far from satisfactory. We propose an effective BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. To adapt BERT into the LID pipeline, we drop in a conjunction network prior to BERT to accommodate the frame-level Phonetic Posteriorgrams(PPG) derived from the frontend phone recognizer and then fine-tune the conjunction network and BERT pre-trained model together. We evaluate several variations within this piped framework, including combining BERT with CNN, LSTM, DPCNN, and RCNN. The experimental results demonstrate that the best-performing model is RCNN-BERT. Compared with the prior works, our RCNN-BERT model can improve the accuracy by about 5% on long-segment identification and 18% on short-segment identification. The outperformance of our model, especially on the short-segment task, demonstrates the applicability of our proposed BERT-based approach on language identification.

Related papers

Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively. However, Whisper struggles with unseen languages, those not included in its pre-training. We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks. adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z)
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z)
Is Attention always needed? A Case Study on Language Identification from Speech [1.162918464251504]
The present study introduces convolutional recurrent neural network (CRNN) based LID. CRNN based LID is designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples. The LID model exhibits high-performance levels ranging from 97% to 100% for languages that are linguistically similar.
arXiv Detail & Related papers (2021-10-05T16:38:57Z)
FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z)
Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT. We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z)
EstBERT: A Pretrained Language-Specific BERT for Estonian [0.3674863913115431]
This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian. Recent work has evaluated multilingual BERT models on Estonian tasks and found them to outperform the baselines. We show that the models based on EstBERT outperform multilingual BERT models on five tasks out of six.
arXiv Detail & Related papers (2020-11-09T21:33:53Z)
Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR) AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities. Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
Cross-lingual Information Retrieval with BERT [8.052497255948046]
We explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents. A deep relevance matching model based on BERT is introduced and trained by finetuning a pretrained multilingual BERT model with weak supervision. Experimental results of the retrieval of Lithuanian documents against short English queries show that our model is effective and outperforms the competitive baseline approaches.
arXiv Detail & Related papers (2020-04-24T23:32:13Z)
A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks. Datasize and context window size are crucial factors to the transferability. There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
RobBERT: a Dutch RoBERTa-based Language Model [9.797319790710711]
We use RoBERTa to train a Dutch language model called RobBERT. We measure its performance on various tasks as well as the importance of the fine-tuning dataset size. RobBERT improves state-of-the-art results for various tasks, and especially significantly outperforms other models when dealing with smaller datasets.
arXiv Detail & Related papers (2020-01-17T13:25:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.