O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
- URL: http://arxiv.org/abs/2512.15229v1
- Date: Wed, 17 Dec 2025 09:27:23 GMT
- Title: O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
- Authors: Elio Gruttadauria, Mathieu Fontaine, Jonathan Le Roux, Slim Essid,
- Abstract summary: O-EENC-SD is an end-to-end online speaker diarization system based on EEND-EDA.<n>We develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study.
- Score: 46.665605221223295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clustering approaches, and a more efficient alternative to current online end-to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.
Related papers
- Joint Optimization of Model Partitioning and Resource Allocation for Anti-Jamming Collaborative Inference Systems [52.842088497389746]
This letter focuses on an anti-jamming collaborative inference system in the presence of a malicious jammer.<n>We first analyze the effects of jamming and DNN partitioning on inference accuracy via data regression.<n>We propose an efficient alternating optimization-based algorithm, which decomposes the problem into three subproblems.
arXiv Detail & Related papers (2026-03-03T03:52:52Z) - An Online Automatic Modulation Classification Scheme Based on Isolation Distributional Kernel [10.102343518449118]
This paper introduces a new online AMC scheme based on Isolation Distributional Kernel.
Firstly, it is the first proposal to represent baseband signals using a distributional kernel.
Secondly, it introduces a pioneering AMC technique that works well in online settings under realistic time-varying channel conditions.
arXiv Detail & Related papers (2024-10-03T17:57:50Z) - HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection [66.42229859018775]
We introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD.
HUWSOD incorporates a self-supervised proposal generator and an autoencoder proposal generator with a multi-rate re-supervised pyramid to replace traditional object proposals.
Our findings indicate that randomly boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.
arXiv Detail & Related papers (2024-06-27T17:59:49Z) - Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization [8.737789422658247]
We propose a novel framework applying EEND both locally and globally for long-form audio without separate speaker embeddings.
This approach achieves significant relative DER reduction of 13% and 10% over the conventional 1-pass EEND on Callhome American English and RT03-CTS datasets respectively.
arXiv Detail & Related papers (2024-06-26T18:32:16Z) - End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization [34.90908110610951]
We propose an end-to-end supervised hierarchical clustering algorithm based on graph neural networks (GNN)<n>The proposed E-SHARC framework provides competitive diarization results using graph based clustering methods.
arXiv Detail & Related papers (2024-01-23T15:35:44Z) - Age of Semantics in Cooperative Communications: To Expedite Simulation
Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system.
We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework.
We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - Improving Character Error Rate Is Not Equal to Having Clean Speech:
Speech Enhancement for ASR Systems with Black-box Acoustic Models [1.6328866317851185]
A deep neural network (DNN)-based speech enhancement (SE) is proposed in this paper.
Our method uses two DNNs: one for speech processing and one for mimicking the output CERs derived through an acoustic model (AM)
Experimental results show that our method improved CER by 7.3% relative derived through a black-box AM although certain noise levels are kept.
arXiv Detail & Related papers (2021-10-12T12:51:53Z) - Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge
Intelligence via Online Learning [19.013102763434794]
This paper builds a collaborative deep inference system between a resource-constrained mobile device and a powerful edge server.
Our system has a built-in online learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn the optimal partition point.
ANS significantly outperforms state-of-the-art benchmarks in terms of tracking system changes and reducing the end-to-end inference delay.
arXiv Detail & Related papers (2021-02-02T18:50:06Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.