Disfluency Detection with Unlabeled Data and Small BERT Models
- URL: http://arxiv.org/abs/2104.10769v1
- Date: Wed, 21 Apr 2021 21:24:32 GMT
- Title: Disfluency Detection with Unlabeled Data and Small BERT Models
- Authors: Johann C. Rocholl, Vicky Zayats, Daniel D. Walker, Noah B. Murad,
Aaron Schneider, Daniel J. Liebling
- Abstract summary: We focus on the disfluency detection task, focusing on small, fast, on-device models based on the BERT architecture.
We demonstrate it is possible to train disfluency detection models as small as 1.3 MiB, while retaining high performance.
- Score: 3.04133054437883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disfluency detection models now approach high accuracy on English text.
However, little exploration has been done in improving the size and inference
time of the model. At the same time, automatic speech recognition (ASR) models
are moving from server-side inference to local, on-device inference. Supporting
models in the transcription pipeline (like disfluency detection) must follow
suit. In this work we concentrate on the disfluency detection task, focusing on
small, fast, on-device models based on the BERT architecture. We demonstrate it
is possible to train disfluency detection models as small as 1.3 MiB, while
retaining high performance. We build on previous work that showed the benefit
of data augmentation approaches such as self-training. Then, we evaluate the
effect of domain mismatch between conversational and written text on model
performance. We find that domain adaptation and data augmentation strategies
have a more pronounced effect on these smaller models, as compared to
conventional BERT models.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Source-Free Test-Time Adaptation For Online Surface-Defect Detection [29.69030283193086]
We propose a novel test-time adaptation surface-defect detection approach.
It adapts pre-trained models to new domains and classes during inference.
Experiments demonstrate it outperforms state-of-the-art techniques.
arXiv Detail & Related papers (2024-08-18T14:24:05Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models [19.015202590038996]
We evaluate the factuality of different models tuned by various preference learning algorithms.
We propose textbfAPEFT (textbfAtomic textbfPreference textbfEnhanced textbfFactuality textbfTuning) to enhance model's awareness of factuality.
arXiv Detail & Related papers (2024-06-18T09:07:30Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Inference from Real-World Sparse Measurements [21.194357028394226]
Real-world problems often involve complex and unstructured sets of measurements, which occur when sensors are sparsely placed in either space or time.
Deep learning architectures capable of processing sets of measurements with positions varying from set to set and extracting readouts anywhere are methodologically difficult.
We propose an attention-based model focused on applicability and practical robustness, with two key design contributions.
arXiv Detail & Related papers (2022-10-20T13:42:20Z) - Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming
Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time.
Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z) - How to Learn when Data Gradually Reacts to Your Model [10.074466859579571]
We propose a new algorithm, Stateful Performative Gradient Descent (Stateful PerfGD), for minimizing the performative loss even in the presence of these effects.
Our experiments confirm that Stateful PerfGD substantially outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-12-13T22:05:26Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.