Related papers: Speaker-change Aware CRF for Dialogue Act Classification

Speaker-change Aware CRF for Dialogue Act Classification

URL: http://arxiv.org/abs/2004.02913v3
Date: Sat, 24 Jun 2023 22:04:37 GMT
Title: Speaker-change Aware CRF for Dialogue Act Classification
Authors: Guokan Shang (1 and 2), Antoine Jean-Pierre Tixier (1), Michalis Vazirgiannis (1 and 3), Jean-Pierre Lorr\'e (2) ((1) \'Ecole Polytechnique, (2) Linagora, (3) AUEB)
Abstract summary: Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem. This paper proposes a simple modification of the CRF layer that takes speaker-change into account.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem, using neural network models coupled with a Conditional Random Field (CRF) as the last layer. CRF models the conditional probability of the target DA label sequence given the input utterance sequence. However, the task involves another important input sequence, that of speakers, which is ignored by previous work. To address this limitation, this paper proposes a simple modification of the CRF layer that takes speaker-change into account. Experiments on the SwDA corpus show that our modified CRF layer outperforms the original one, with very wide margins for some DA labels. Further, visualizations demonstrate that our CRF layer can learn meaningful, sophisticated transition patterns between DA label pairs conditioned on speaker-change in an end-to-end way. Code is publicly available.

Related papers

Regular-pattern-sensitive CRFs for Distant Label Interactions [10.64258723923874]
We present regular-pattern-sensitive CRFs, a method of enriching standard linear-chain CRFs with the ability to learn long-distance label interactions.<n>We detail how an RPCRF can be automatically constructed from a set of user-specified patterns, and demonstrate the model's effectiveness on a sequence of three synthetic sequence modeling datasets.
arXiv Detail & Related papers (2024-11-19T13:08:03Z)
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR [54.23941663326509]
Frequent speaker changes can make speaker change prediction difficult. We propose boundary-aware serialized output training (BA-SOT) Compared to original SOT, BA-SOT reduces CER/UD-CER by 5.1%/14.0%.
arXiv Detail & Related papers (2023-05-23T06:08:13Z)
Neural Diarization with Non-autoregressive Intermediate Attractors [37.49735004139322]
We propose a novel EEND model that introduces the label dependency between frames. The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.
arXiv Detail & Related papers (2023-03-13T01:28:55Z)
Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z)
Joint Span Segmentation and Rhetorical Role Labeling with Data Augmentation for Legal Documents [1.4072904523937537]
Rhetorical Role Labeling of legal judgements play a crucial role in retrieval and adjacent tasks. We reformulate the task at span level as identifying spans of multiple consecutive sentences that share the same rhetorical role label. We employ semi-Markov Conditional Random Fields (CRF) to jointly learn span segmentation and span label assignment.
arXiv Detail & Related papers (2023-02-13T15:28:02Z)
Speaker Embedding-aware Neural Diarization: a Novel Framework for Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem. We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z)
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR [77.82653227783447]
We propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network. As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.
arXiv Detail & Related papers (2022-03-01T05:02:02Z)
Masked Conditional Random Fields for Sequence Labeling [2.982218441172364]
Conditional Random Field (CRF) based neural models are among the most performant methods for solving sequence labeling problems. We propose Masked Conditional Random Field (MCRF), an easy to implement variant of CRF that impose restrictions on candidate paths during both training and decoding phases. We show that the proposed method thoroughly resolves this issue and brings consistent improvement over existing CRF-based models with near zero additional cost.
arXiv Detail & Related papers (2021-03-19T08:23:24Z)
Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network. We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z)
Constrained Decoding for Computationally Efficient Named Entity Recognition Taggers [15.279850826041066]
Current work eschews prior knowledge of how the span encoding scheme works and relies on the conditional random field (CRF) learning which transitions are illegal and which are not to facilitate global coherence. We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant.
arXiv Detail & Related papers (2020-10-09T04:07:52Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.