Named Entity Recognition for Address Extraction in Speech-to-Text
Transcriptions Using Synthetic Data
- URL: http://arxiv.org/abs/2402.05545v1
- Date: Thu, 8 Feb 2024 10:29:11 GMT
- Title: Named Entity Recognition for Address Extraction in Speech-to-Text
Transcriptions Using Synthetic Data
- Authors: Bibi\'ana Laj\v{c}inov\'a, Patrik Val\'abek and Michal Spi\v{s}iak
- Abstract summary: This paper introduces an approach for building a Named Entity Recognition (NER) model built upon a Bidirectional Representations from Transformers (BERT) architecture.
This NER model extracts address parts from data acquired from speech-to-text transcriptions.
The performance of our NER model, trained solely on synthetic data, is evaluated using small real test dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces an approach for building a Named Entity Recognition
(NER) model built upon a Bidirectional Encoder Representations from
Transformers (BERT) architecture, specifically utilizing the SlovakBERT model.
This NER model extracts address parts from data acquired from speech-to-text
transcriptions. Due to scarcity of real data, a synthetic dataset using GPT API
was generated. The importance of mimicking spoken language variability in this
artificial data is emphasized. The performance of our NER model, trained solely
on synthetic data, is evaluated using small real test dataset.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Speech Emotion Recognition under Resource Constraints with Data Distillation [64.36799373890916]
Speech emotion recognition (SER) plays a crucial role in human-computer interaction.
The emergence of edge devices in the Internet of Things presents challenges in constructing intricate deep learning models.
We propose a data distillation framework to facilitate efficient development of SER models in IoT applications.
arXiv Detail & Related papers (2024-06-21T13:10:46Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - Advancing Semi-Supervised Learning for Automatic Post-Editing: Data-Synthesis by Mask-Infilling with Erroneous Terms [5.366354612549173]
We focus on data-synthesis methods to create high-quality synthetic data.
We present a data-synthesis method by which the resulting synthetic data mimic the translation errors found in actual data.
Experimental results show that using the synthetic data created by our approach results in significantly better APE performance than other synthetic data created by existing methods.
arXiv Detail & Related papers (2022-04-08T07:48:57Z) - End-to-end model for named entity recognition from speech without paired
training data [12.66131972249388]
We propose an approach to build an end-to-end neural model to extract semantic information.
Our approach is based on the use of an external model trained to generate a sequence of vectorial representations from text.
Experiments on named entity recognition, carried out on the QUAERO corpus, show that this approach is very promising.
arXiv Detail & Related papers (2022-04-02T08:14:27Z) - Hierarchical Transformer Model for Scientific Named Entity Recognition [0.20646127669654832]
We present a simple and effective approach for Named Entity Recognition.
The main idea of our approach is to encode the input subword sequence with a pre-trained transformer such as BERT.
We evaluate our approach on three benchmark datasets for scientific NER.
arXiv Detail & Related papers (2022-03-28T12:59:06Z) - Label Semantics for Few Shot Named Entity Recognition [68.01364012546402]
We study the problem of few shot learning for named entity recognition.
We leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors.
Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder.
arXiv Detail & Related papers (2022-03-16T23:21:05Z) - Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition [18.924716098922683]
Machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions.
We propose two novel techniques during training to mitigate the problems due to the distribution gap.
We show that these methods significantly improve the training of speech recognition models using synthetic data.
arXiv Detail & Related papers (2021-10-21T21:11:42Z) - Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance.
Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand.
We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z) - Relative Positional Encoding for Speech Recognition and Direct
Translation [72.64499573561922]
We adapt the relative position encoding scheme to the Speech Transformer.
As a result, the network can better adapt to the variable distributions present in speech data.
arXiv Detail & Related papers (2020-05-20T09:53:06Z) - Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation
with Semantic Fidelity [3.8673630752805432]
We present DataTuner, a neural, end-to-end data-to-text generation system.
We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity.
We show that DataTuner achieves state of the art results on the automated metrics across four major D2T datasets.
arXiv Detail & Related papers (2020-04-08T11:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.