The SpeakIn System Description for CNSRC2022
- URL: http://arxiv.org/abs/2209.10846v1
- Date: Thu, 22 Sep 2022 08:17:47 GMT
- Title: The SpeakIn System Description for CNSRC2022
- Authors: Yu Zheng, Yihao Chen, Jinghan Peng, Yajun Zhang, Min Liu, Minqiang Xu
- Abstract summary: This report describes our speaker verification systems for the tasks of the CN-Celeb Speaker Recognition Challenge 2022 (CNSRC 2022)
The challenge includes two tasks, namely speaker verification(SV) and speaker retrieval(SR)
- Score: 14.173172568687413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report describes our speaker verification systems for the tasks of the
CN-Celeb Speaker Recognition Challenge 2022 (CNSRC 2022). This challenge
includes two tasks, namely speaker verification(SV) and speaker retrieval(SR).
The SV task involves two tracks: fixed track and open track. In the fixed
track, we only used CN-Celeb.T as the training set. For the open track of the
SV task and SR task, we added our open-source audio data. The ResNet-based,
RepVGG-based, and TDNN-based architectures were developed for this challenge.
Global statistic pooling structure and MQMHA pooling structure were used to
aggregate the frame-level features across time to obtain utterance-level
representation. We adopted AM-Softmax and AAM-Softmax combined with the
Sub-Center method to classify the resulting embeddings. We also used the
Large-Margin Fine-Tuning strategy to further improve the model performance. In
the backend, Sub-Mean and AS-Norm were used. In the SV task fixed track, our
system was a fusion of five models, and two models were fused in the SV task
open track. And we used a single system in the SR task. Our approach leads to
superior performance and comes the 1st place in the open track of the SV task,
the 2nd place in the fixed track of the SV task, and the 3rd place in the SR
task.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - A Study on the Integration of Pipeline and E2E SLU systems for Spoken
Semantic Parsing toward STOP Quality Challenge [33.89616011003973]
We describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge.
Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance.
We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.
arXiv Detail & Related papers (2023-05-02T17:25:19Z) - The SpeakIn Speaker Verification System for Far-Field Speaker
Verification Challenge 2022 [15.453882034529913]
This paper describes speaker verification systems submitted to the Far-Field Speaker Verification Challenge 2022 (FFSVC2022)
The ResNet-based and RepVGG-based architectures were developed for this challenge.
Our approach leads to excellent performance and ranks 1st in both challenge tasks.
arXiv Detail & Related papers (2022-09-23T14:51:55Z) - Anchor-Free Person Search [127.88668724345195]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
Most existing works employ two-stage detectors like Faster-RCNN, yielding encouraging accuracy but with high computational overhead.
We present the Feature-Aligned Person Search Network (AlignPS), the first anchor-free framework to efficiently tackle this challenging task.
arXiv Detail & Related papers (2021-03-22T07:04:29Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Multi-Task Learning for Interpretable Weakly Labelled Sound Event
Detection [34.99472489405047]
This paper proposes a Multi-Task Learning framework for learning from Weakly Labelled Audio data.
We show that the chosen auxiliary task de-noises internal T-F representation and improves SED performance under noisy recordings.
The proposed total framework outperforms existing benchmark models over all SNRs.
arXiv Detail & Related papers (2020-08-17T04:46:25Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Yseop at SemEval-2020 Task 5: Cascaded BERT Language Model for
Counterfactual Statement Analysis [0.0]
We use a BERT base model for the classification task and build a hybrid BERT Multi-Layer Perceptron system to handle the sequence identification task.
Our experiments show that while introducing syntactic and semantic features does little in improving the system in the classification task, using these types of features as cascaded linear inputs to fine-tune the sequence-delimiting ability of the model ensures it outperforms other similar-purpose complex systems like BiLSTM-CRF in the second task.
arXiv Detail & Related papers (2020-05-18T08:19:18Z) - Conditional Channel Gated Networks for Task-Aware Continual Learning [44.894710899300435]
Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems.
We introduce a novel framework to tackle this problem with conditional computation.
We validate our proposal on four continual learning datasets.
arXiv Detail & Related papers (2020-03-31T19:35:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.