Neural Diarization with Non-autoregressive Intermediate Attractors
- URL: http://arxiv.org/abs/2303.06806v1
- Date: Mon, 13 Mar 2023 01:28:55 GMT
- Title: Neural Diarization with Non-autoregressive Intermediate Attractors
- Authors: Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji
Ogawa
- Abstract summary: We propose a novel EEND model that introduces the label dependency between frames.
The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.
- Score: 37.49735004139322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end neural diarization (EEND) with encoder-decoder-based attractors
(EDA) is a promising method to handle the whole speaker diarization problem
simultaneously with a single neural network. While the EEND model can produce
all frame-level speaker labels simultaneously, it disregards output label
dependency. In this work, we propose a novel EEND model that introduces the
label dependency between frames. The proposed method generates
non-autoregressive intermediate attractors to produce speaker labels at the
lower layers and conditions the subsequent layers with these labels. While the
proposed model works in a non-autoregressive manner, the speaker labels are
refined by referring to the whole sequence of intermediate labels. The
experiments with the two-speaker CALLHOME dataset show that the intermediate
labels with the proposed non-autoregressive intermediate attractors boost the
diarization performance. The proposed method with the deeper network benefits
more from the intermediate labels, resulting in better performance and training
throughput than EEND-EDA.
Related papers
- BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition [63.45645200463539]
BiRQ is a bilevel SSL framework that combines the efficiency of BEST-RQ with the refinement benefits of HuBERT-style label enhancement.<n>We validate our method on various datasets, including 960-hour LibriSpeech, 150-hour AMI meetings and 5,000-hour YODAS.
arXiv Detail & Related papers (2025-09-18T21:09:29Z) - Adaptive Integration of Partial Label Learning and Negative Learning for
Enhanced Noisy Label Learning [23.847160480176697]
We propose a simple yet powerful idea called textbfNPN, which revolutionizes textbfNoisy label learning.
We generate reliable complementary labels using all non-candidate labels for NL to enhance model robustness through indirect supervision.
Experiments conducted on both synthetically corrupted and real-world noisy datasets demonstrate the superiority of NPN compared to other state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2023-12-15T03:06:19Z) - Transductive CLIP with Class-Conditional Contrastive Learning [68.51078382124331]
We propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch.
A class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels.
ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels.
arXiv Detail & Related papers (2022-06-13T14:04:57Z) - Label-Enhanced Graph Neural Network for Semi-supervised Node
Classification [32.64730237473914]
We present a label-enhanced learning framework for Graph Neural Networks (GNNs)
It first models each label as a virtual center for intra-class nodes and then jointly learns the representations of both nodes and labels.
Our approach could not only smooth the representations of nodes belonging to the same class, but also explicitly encode the label semantics into the learning process of GNNs.
arXiv Detail & Related papers (2022-05-31T09:48:47Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - A Label Dependence-aware Sequence Generation Model for Multi-level
Implicit Discourse Relation Recognition [31.179555215952306]
Implicit discourse relation recognition is a challenging but crucial task in discourse analysis.
We propose a Label Dependence-aware Sequence Generation Model (LDSGM) for it.
We develop a mutual learning enhanced training method to exploit the label dependence in a bottomup direction.
arXiv Detail & Related papers (2021-12-22T09:14:03Z) - Dual-Refinement: Joint Label and Feature Refinement for Unsupervised
Domain Adaptive Person Re-Identification [51.98150752331922]
Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data.
We propose a novel approach, called Dual-Refinement, that jointly refines pseudo labels at the off-line clustering phase and features at the on-line training phase.
Our method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-26T07:35:35Z) - Delving Deep into Label Smoothing [112.24527926373084]
Label smoothing is an effective regularization tool for deep neural networks (DNNs)
We present an Online Label Smoothing (OLS) strategy, which generates soft labels based on the statistics of the model prediction for the target category.
arXiv Detail & Related papers (2020-11-25T08:03:11Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - Speaker-change Aware CRF for Dialogue Act Classification [0.0]
Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem.
This paper proposes a simple modification of the CRF layer that takes speaker-change into account.
arXiv Detail & Related papers (2020-04-06T18:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.