Speaker-change Aware CRF for Dialogue Act Classification
- URL: http://arxiv.org/abs/2004.02913v3
- Date: Sat, 24 Jun 2023 22:04:37 GMT
- Title: Speaker-change Aware CRF for Dialogue Act Classification
- Authors: Guokan Shang (1 and 2), Antoine Jean-Pierre Tixier (1), Michalis
Vazirgiannis (1 and 3), Jean-Pierre Lorr\'e (2) ((1) \'Ecole Polytechnique,
(2) Linagora, (3) AUEB)
- Abstract summary: Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem.
This paper proposes a simple modification of the CRF layer that takes speaker-change into account.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in Dialogue Act (DA) classification approaches the task as a
sequence labeling problem, using neural network models coupled with a
Conditional Random Field (CRF) as the last layer. CRF models the conditional
probability of the target DA label sequence given the input utterance sequence.
However, the task involves another important input sequence, that of speakers,
which is ignored by previous work. To address this limitation, this paper
proposes a simple modification of the CRF layer that takes speaker-change into
account. Experiments on the SwDA corpus show that our modified CRF layer
outperforms the original one, with very wide margins for some DA labels.
Further, visualizations demonstrate that our CRF layer can learn meaningful,
sophisticated transition patterns between DA label pairs conditioned on
speaker-change in an end-to-end way. Code is publicly available.
Related papers
- Regular-pattern-sensitive CRFs for Distant Label Interactions [10.64258723923874]
We present regular-pattern-sensitive CRFs, a method of enriching standard linear-chain CRFs with the ability to learn long-distance label interactions.<n>We detail how an RPCRF can be automatically constructed from a set of user-specified patterns, and demonstrate the model's effectiveness on a sequence of three synthetic sequence modeling datasets.
arXiv Detail & Related papers (2024-11-19T13:08:03Z) - BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR [54.23941663326509]
Frequent speaker changes can make speaker change prediction difficult.
We propose boundary-aware serialized output training (BA-SOT)
Compared to original SOT, BA-SOT reduces CER/UD-CER by 5.1%/14.0%.
arXiv Detail & Related papers (2023-05-23T06:08:13Z) - Neural Diarization with Non-autoregressive Intermediate Attractors [37.49735004139322]
We propose a novel EEND model that introduces the label dependency between frames.
The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.
arXiv Detail & Related papers (2023-03-13T01:28:55Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Joint Span Segmentation and Rhetorical Role Labeling with Data
Augmentation for Legal Documents [1.4072904523937537]
Rhetorical Role Labeling of legal judgements play a crucial role in retrieval and adjacent tasks.
We reformulate the task at span level as identifying spans of multiple consecutive sentences that share the same rhetorical role label.
We employ semi-Markov Conditional Random Fields (CRF) to jointly learn span segmentation and span label assignment.
arXiv Detail & Related papers (2023-02-13T15:28:02Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR [77.82653227783447]
We propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network.
As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.
arXiv Detail & Related papers (2022-03-01T05:02:02Z) - Masked Conditional Random Fields for Sequence Labeling [2.982218441172364]
Conditional Random Field (CRF) based neural models are among the most performant methods for solving sequence labeling problems.
We propose Masked Conditional Random Field (MCRF), an easy to implement variant of CRF that impose restrictions on candidate paths during both training and decoding phases.
We show that the proposed method thoroughly resolves this issue and brings consistent improvement over existing CRF-based models with near zero additional cost.
arXiv Detail & Related papers (2021-03-19T08:23:24Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - Constrained Decoding for Computationally Efficient Named Entity
Recognition Taggers [15.279850826041066]
Current work eschews prior knowledge of how the span encoding scheme works and relies on the conditional random field (CRF) learning which transitions are illegal and which are not to facilitate global coherence.
We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant.
arXiv Detail & Related papers (2020-10-09T04:07:52Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.