Aligning Speakers: Evaluating and Visualizing Text-based Diarization
Using Efficient Multiple Sequence Alignment (Extended Version)
- URL: http://arxiv.org/abs/2309.07677v1
- Date: Thu, 14 Sep 2023 12:43:26 GMT
- Title: Aligning Speakers: Evaluating and Visualizing Text-based Diarization
Using Efficient Multiple Sequence Alignment (Extended Version)
- Authors: Chen Gong, Peilin Wu, Jinho D. Choi
- Abstract summary: Two new metrics are proposed, Text-based Diarization Error Rate and Diarization F1, which perform utterance- and word-level evaluations.
Our metrics encompass more types of errors compared to existing ones, allowing us to make a more comprehensive analysis in speaker diarization.
- Score: 21.325463387256807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel evaluation approach to text-based speaker
diarization (SD), tackling the limitations of traditional metrics that do not
account for any contextual information in text. Two new metrics are proposed,
Text-based Diarization Error Rate and Diarization F1, which perform utterance-
and word-level evaluations by aligning tokens in reference and hypothesis
transcripts. Our metrics encompass more types of errors compared to existing
ones, allowing us to make a more comprehensive analysis in SD. To align tokens,
a multiple sequence alignment algorithm is introduced that supports multiple
sequences in the reference while handling high-dimensional alignment to the
hypothesis using dynamic programming. Our work is packaged into two tools,
align4d providing an API for our alignment algorithm and TranscribeView for
visualizing and evaluating SD errors, which can greatly aid in the creation of
high-quality data, fostering the advancement of dialogue systems.
Related papers
- PLATTER: A Page-Level Handwritten Text Recognition System for Indic Scripts [20.394597266150534]
We present an end-to-end framework for Page-Level hAndwriTTen TExt Recognition (PLATTER)
Secondly, we demonstrate the usage of PLATTER to measure the performance of our language-agnostic HTD model.
Finally, we release a Corpus of Handwritten Indic Scripts (CHIPS), a meticulously curated, page-level Indic handwritten OCR dataset.
arXiv Detail & Related papers (2025-02-10T05:50:26Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection [8.303512060791736]
Spoken term detection is often hindered by reliance on frame-level features and the computationally intensive DTW-based template matching.
We propose a novel approach that encodes speech into discrete, speaker-agnostic semantic tokens.
This facilitates fast retrieval using text-based search algorithms and effectively handles out-of-vocabulary terms.
arXiv Detail & Related papers (2024-11-21T13:05:18Z) - Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - MISMATCH: Fine-grained Evaluation of Machine-generated Text with
Mismatch Error Types [68.76742370525234]
We propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts.
Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types.
We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
arXiv Detail & Related papers (2023-06-18T01:38:53Z) - Analysis of Joint Speech-Text Embeddings for Semantic Matching [3.6423306784901235]
We study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs.
We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios.
arXiv Detail & Related papers (2022-04-04T04:50:32Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.