Related papers: Disfluency Detection with Unlabeled Data and Small BERT Models

Disfluency Detection with Unlabeled Data and Small BERT Models

URL: http://arxiv.org/abs/2104.10769v1
Date: Wed, 21 Apr 2021 21:24:32 GMT
Title: Disfluency Detection with Unlabeled Data and Small BERT Models
Authors: Johann C. Rocholl, Vicky Zayats, Daniel D. Walker, Noah B. Murad, Aaron Schneider, Daniel J. Liebling
Abstract summary: We focus on the disfluency detection task, focusing on small, fast, on-device models based on the BERT architecture. We demonstrate it is possible to train disfluency detection models as small as 1.3 MiB, while retaining high performance.
Score: 3.04133054437883
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Disfluency detection models now approach high accuracy on English text. However, little exploration has been done in improving the size and inference time of the model. At the same time, automatic speech recognition (ASR) models are moving from server-side inference to local, on-device inference. Supporting models in the transcription pipeline (like disfluency detection) must follow suit. In this work we concentrate on the disfluency detection task, focusing on small, fast, on-device models based on the BERT architecture. We demonstrate it is possible to train disfluency detection models as small as 1.3 MiB, while retaining high performance. We build on previous work that showed the benefit of data augmentation approaches such as self-training. Then, we evaluate the effect of domain mismatch between conversational and written text on model performance. We find that domain adaptation and data augmentation strategies have a more pronounced effect on these smaller models, as compared to conventional BERT models.

Related papers

Learning from Generalization Patterns: An Evaluation-Driven Approach to Enhanced Data Augmentation for Fine-Tuning Small Language Models [16.470481192733676]
PaDA-Agent is an evaluation-driven approach that streamlines the data augmentation process for SLMs.<n>Our experimental results demonstrate significant improvements over state-of-the-art LLM-based data augmentation approaches for Llama 3.2 1B Instruct model fine-tuning.
arXiv Detail & Related papers (2025-10-20T22:36:46Z)
AMUN: Adversarial Machine UNlearning [13.776549741449557]
Adversarial Machine UNlearning (AMUN) outperforms prior state-of-the-art (SOTA) methods for image classification. AMUN lowers the confidence of the model on the forget samples by fine-tuning the model on their corresponding adversarial examples.
arXiv Detail & Related papers (2025-03-02T14:36:31Z)
Retrieval Augmented Anomaly Detection (RAAD): Nimble Model Adjustment Without Retraining [3.037546128667634]
We introduce Retrieval Augmented Anomaly Detection, a novel method taking inspiration from Retrieval Augmented Generation. Human annotated examples are sent to a vector store, which can modify model outputs on the very next processed batch for model inference.
arXiv Detail & Related papers (2025-02-26T20:17:16Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Source-Free Test-Time Adaptation For Online Surface-Defect Detection [29.69030283193086]
We propose a novel test-time adaptation surface-defect detection approach. It adapts pre-trained models to new domains and classes during inference. Experiments demonstrate it outperforms state-of-the-art techniques.
arXiv Detail & Related papers (2024-08-18T14:24:05Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models [19.015202590038996]
We evaluate the factuality of different models tuned by various preference learning algorithms. We propose textbfAPEFT (textbfAtomic textbfPreference textbfEnhanced textbfFactuality textbfTuning) to enhance model's awareness of factuality.
arXiv Detail & Related papers (2024-06-18T09:07:30Z)
Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models. Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z)
Inference from Real-World Sparse Measurements [21.194357028394226]
Real-world problems often involve complex and unstructured sets of measurements, which occur when sensors are sparsely placed in either space or time. Deep learning architectures capable of processing sets of measurements with positions varying from set to set and extracting readouts anywhere are methodologically difficult. We propose an attention-based model focused on applicability and practical robustness, with two key design contributions.
arXiv Detail & Related papers (2022-10-20T13:42:20Z)
Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time. Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z)
How to Learn when Data Gradually Reacts to Your Model [10.074466859579571]
We propose a new algorithm, Stateful Performative Gradient Descent (Stateful PerfGD), for minimizing the performative loss even in the presence of these effects. Our experiments confirm that Stateful PerfGD substantially outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-12-13T22:05:26Z)
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU) Recent works have shown that using extra data and labels can improve the OOD detection performance. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.