Auxiliary Sequence Labeling Tasks for Disfluency Detection
- URL: http://arxiv.org/abs/2011.04512v2
- Date: Mon, 5 Apr 2021 13:09:23 GMT
- Title: Auxiliary Sequence Labeling Tasks for Disfluency Detection
- Authors: Dongyub Lee, Byeongil Ko, Myeong Cheol Shin, Taesun Whang, Daniel Lee,
Eun Hwa Kim, EungGyun Kim, and Jaechoon Jo
- Abstract summary: We propose a method utilizing named entity recognition (NER) and part-of-speech (POS) as auxiliary sequence labeling (SL) tasks for disfluency detection.
We show that training a disfluency detection model with auxiliary SL tasks can improve its F-score in disfluency detection.
Experimental results on the widely used English Switchboard dataset show that our method outperforms the previous state-of-the-art in disfluency detection.
- Score: 6.460424516393765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting disfluencies in spontaneous speech is an important preprocessing
step in natural language processing and speech recognition applications.
Existing works for disfluency detection have focused on designing a single
objective only for disfluency detection, while auxiliary objectives utilizing
linguistic information of a word such as named entity or part-of-speech
information can be effective. In this paper, we focus on detecting disfluencies
on spoken transcripts and propose a method utilizing named entity recognition
(NER) and part-of-speech (POS) as auxiliary sequence labeling (SL) tasks for
disfluency detection. First, we investigate cases that utilizing linguistic
information of a word can prevent mispredicting important words and can be
helpful for the correct detection of disfluencies. Second, we show that
training a disfluency detection model with auxiliary SL tasks can improve its
F-score in disfluency detection. Then, we analyze which auxiliary SL tasks are
influential depending on baseline models. Experimental results on the widely
used English Switchboard dataset show that our method outperforms the previous
state-of-the-art in disfluency detection.
Related papers
- Large Language Models for Dysfluency Detection in Stuttered Speech [16.812800649507302]
Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components.
Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, we approach the task of multi-label dysfluency detection as a language modeling problem.
We present hypotheses candidates generated with an automatic speech recognition system and acoustic representations extracted from an audio encoder model to an LLM, and finetune the system to predict dysfluency labels on three datasets containing English and German stuttered speech.
arXiv Detail & Related papers (2024-06-16T17:51:22Z) - Automatic Disfluency Detection from Untranscribed Speech [25.534535098405602]
Stuttering is a speech disorder characterized by a high rate of disfluencies.
automatic disfluency detection may help in treatment planning for individuals who stutter.
We investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization.
arXiv Detail & Related papers (2023-11-01T21:36:39Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - From Disfluency Detection to Intent Detection and Slot Filling [12.289620439224839]
We extend the fluent Vietnamese intent detection and slot filling dataset PhoATIS by manually adding contextual disfluencies and annotating them.
We conduct experiments using strong baselines for disfluency detection and joint intent detection and slot filling, which are based on pre-trained language models.
We find that: (i) disfluencies produce negative effects on the performances of the downstream intent detection and slot filling tasks, and (ii) in the disfluency context, the pre-trained multilingual language model XLM-R helps produce better intent detection and slot filling performances than the pre-trained monolingual language model Pho
arXiv Detail & Related papers (2022-09-17T16:03:57Z) - Span Classification with Structured Information for Disfluency Detection
in Spoken Utterances [47.05113261111054]
We propose a novel architecture for detecting disfluencies in transcripts from spoken utterances.
Our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection.
arXiv Detail & Related papers (2022-03-30T03:22:29Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - End-to-End Speech Recognition and Disfluency Removal [15.910282983166024]
This paper investigates the task of end-to-end speech recognition and disfluency removal.
We show that end-to-end models do learn to directly generate fluent transcripts.
We propose two new metrics that can be used for evaluating integrated ASR and disfluency models.
arXiv Detail & Related papers (2020-09-22T03:11:37Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.