Related papers: Adapting the NICT-JLE Corpus for Disfluency Detection Models

Adapting the NICT-JLE Corpus for Disfluency Detection Models

URL: http://arxiv.org/abs/2308.02482v1
Date: Fri, 4 Aug 2023 17:54:52 GMT
Title: Adapting the NICT-JLE Corpus for Disfluency Detection Models
Authors: Lucy Skidmore and Roger K. Moore
Abstract summary: This paper describes the adaptation of the NICT-JLE corpus to a format suitable for disfluency detection model training and evaluation. Points of difference between the NICT-JLE and Switchboard corpora are explored, followed by a detailed overview of adaptations to the tag set and meta-features. The result of this work provides a standardised train, heldout and test set for use in future research on disfluency detection for learner speech.
Score: 9.90780328490921
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The detection of disfluencies such as hesitations, repetitions and false starts commonly found in speech is a widely studied area of research. With a standardised process for evaluation using the Switchboard Corpus, model performance can be easily compared across approaches. This is not the case for disfluency detection research on learner speech, however, where such datasets have restricted access policies, making comparison and subsequent development of improved models more challenging. To address this issue, this paper describes the adaptation of the NICT-JLE corpus, containing approximately 300 hours of English learners' oral proficiency tests, to a format that is suitable for disfluency detection model training and evaluation. Points of difference between the NICT-JLE and Switchboard corpora are explored, followed by a detailed overview of adaptations to the tag set and meta-features of the NICT-JLE corpus. The result of this work provides a standardised train, heldout and test set for use in future research on disfluency detection for learner speech.

Related papers

Enhancing and Exploring Mild Cognitive Impairment Detection with W2V-BERT-2.0 [1.3988930016464454]
This study explores a multi-lingual audio self-supervised learning model for detecting mild cognitive impairment (MCI) using the TAUKADIAL cross-lingual dataset. To address these issues, the study utilizes features directly from speech utterances with W2V-BERT-2.0. The experiment shows competitive results, and the proposed inference logic significantly contributes to the improvements from the baseline.
arXiv Detail & Related papers (2025-01-27T16:55:38Z)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits [82.8859060022651]
We introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization.
arXiv Detail & Related papers (2025-01-07T14:17:47Z)
Corpus-informed Retrieval Augmented Generation of Clarifying Questions [23.123116796159717]
This study aims to develop models that generate corpus informed clarifying questions for web search. In current datasets search intents are largely unsupported by the corpus, which is problematic both for training and evaluation. We propose dataset augmentation methods that align the ground truth clarifications with the retrieval corpus.
arXiv Detail & Related papers (2024-09-27T09:20:42Z)
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection. We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z)
Contextual Spelling Correction with Language Model for Low-resource Setting [0.0]
A small-scale word-based transformer LM is trained to provide the SC model with contextual understanding. Probability of error happening(error model) is extracted from the corpus. Combination of LM and error model is used to develop the SC model through the well-known noisy channel framework.
arXiv Detail & Related papers (2024-04-28T05:29:35Z)
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection [39.970726250810635]
Despite widespread adoption, there is a lack of research into how various critical aspects of pretrained language models affect their performance in hate speech detection. We deep dive into comparing different pretrained models, evaluating their seed robustness, finetuning settings, and the impact of pretraining data collection time. Our analysis reveals early peaks for downstream tasks during pretraining, the limited benefit of employing a more recent pretraining corpus, and the significance of specific layers during finetuning.
arXiv Detail & Related papers (2024-02-03T13:23:51Z)
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding [8.448399308205266]
We introduce an evaluation protocol based on dynamic vocabulary generation to test whether models detect, discern, and assign the correct fine-grained description to objects. We further enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol.
arXiv Detail & Related papers (2023-11-29T10:40:52Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework. We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z)
Unsupervised neural adaptation model based on optimal transport for spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded. We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z)
End-to-End Speech Recognition and Disfluency Removal [15.910282983166024]
This paper investigates the task of end-to-end speech recognition and disfluency removal. We show that end-to-end models do learn to directly generate fluent transcripts. We propose two new metrics that can be used for evaluating integrated ASR and disfluency models.
arXiv Detail & Related papers (2020-09-22T03:11:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.