Aligning Speakers: Evaluating and Visualizing Text-based Diarization
Using Efficient Multiple Sequence Alignment (Extended Version)
- URL: http://arxiv.org/abs/2309.07677v1
- Date: Thu, 14 Sep 2023 12:43:26 GMT
- Title: Aligning Speakers: Evaluating and Visualizing Text-based Diarization
Using Efficient Multiple Sequence Alignment (Extended Version)
- Authors: Chen Gong, Peilin Wu, Jinho D. Choi
- Abstract summary: Two new metrics are proposed, Text-based Diarization Error Rate and Diarization F1, which perform utterance- and word-level evaluations.
Our metrics encompass more types of errors compared to existing ones, allowing us to make a more comprehensive analysis in speaker diarization.
- Score: 21.325463387256807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel evaluation approach to text-based speaker
diarization (SD), tackling the limitations of traditional metrics that do not
account for any contextual information in text. Two new metrics are proposed,
Text-based Diarization Error Rate and Diarization F1, which perform utterance-
and word-level evaluations by aligning tokens in reference and hypothesis
transcripts. Our metrics encompass more types of errors compared to existing
ones, allowing us to make a more comprehensive analysis in SD. To align tokens,
a multiple sequence alignment algorithm is introduced that supports multiple
sequences in the reference while handling high-dimensional alignment to the
hypothesis using dynamic programming. Our work is packaged into two tools,
align4d providing an API for our alignment algorithm and TranscribeView for
visualizing and evaluating SD errors, which can greatly aid in the creation of
high-quality data, fostering the advancement of dialogue systems.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR)
Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding.
We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z) - MISMATCH: Fine-grained Evaluation of Machine-generated Text with
Mismatch Error Types [68.76742370525234]
We propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts.
Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types.
We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
arXiv Detail & Related papers (2023-06-18T01:38:53Z) - D2CSE: Difference-aware Deep continuous prompts for Contrastive Sentence
Embeddings [3.04585143845864]
This paper describes Difference-aware Deep continuous prompt for Contrastive Sentence Embeddings (D2CSE) that learns sentence embeddings.
Compared to state-of-the-art approaches, D2CSE computes sentence vectors that are exceptional to distinguish a subtle difference in similar sentences.
arXiv Detail & Related papers (2023-04-18T13:45:07Z) - Analysis of Joint Speech-Text Embeddings for Semantic Matching [3.6423306784901235]
We study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs.
We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios.
arXiv Detail & Related papers (2022-04-04T04:50:32Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.