Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2
- URL: http://arxiv.org/abs/2403.19634v1
- Date: Thu, 28 Mar 2024 17:49:31 GMT
- Title: Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2
- Authors: Pierre-Michel Bousquet, Mickael Rouvier,
- Abstract summary: This paper describes the contributions of our laboratory to the speaker recognition field.
The proposed approaches experimentally show their relevance and efficiency on the SdSv evaluation.
- Score: 5.766624204597742
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems. But it also made it possible to test new approaches, capable of taking into account the main issues of this challenge (duration, language, ...). This paper describes the contributions of our laboratory to the speaker recognition field. These contributions highlight two other challenges in addition to short-duration and language: the mismatch between enrollment and test data and the one between subsets of the evaluation trial dataset. The proposed approaches experimentally show their relevance and efficiency on the SdSv evaluation, and could be of interest in many real-life applications.
Related papers
- Are Large Language Models the future crowd workers of Linguistics? [0.0]
This research aims to answer the question of whether Large Language Models (LLMs) may overcome obstacles if included in empirical linguistic pipelines.
The two forced elicitation tasks, originally designed for human participants, are reproduced with the help of OpenAI's GPT-4o-mini model.
arXiv Detail & Related papers (2025-02-14T16:23:39Z) - 2-Tier SimCSE: Elevating BERT for Robust Sentence Embeddings [0.0]
We apply SimCSE (Simple Contrastive Learning of Sentence Embeddings) to fine-tune the minBERT model for sentiment analysis, semantic textual similarity (STS), and paraphrase detection.
Our contributions include experimenting with three different dropout techniques, namely standard dropout, curriculum dropout, and adaptive dropout, to tackle overfitting.
Our findings demonstrate the effectiveness of SimCSE, with the 2-Tier model achieving superior performance on the STS task, with an average test score of 0.742.
arXiv Detail & Related papers (2025-01-23T15:36:35Z) - The VoxCeleb Speaker Recognition Challenge: A Retrospective [75.40776645175585]
The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023.
The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings.
We provide a review of these challenges that covers: what they explored; the methods developed by the challenge participants and how these evolved.
arXiv Detail & Related papers (2024-08-27T08:57:31Z) - Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - Narrative Action Evaluation with Prompt-Guided Multimodal Interaction [60.281405999483]
Narrative action evaluation (NAE) aims to generate professional commentary that evaluates the execution of an action.
NAE is a more challenging task because it requires both narrative flexibility and evaluation rigor.
We propose a prompt-guided multimodal interaction framework to facilitate the interaction between different modalities of information.
arXiv Detail & Related papers (2024-04-22T17:55:07Z) - Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study
on Speech Emotion Recognition [54.952250732643115]
We study Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks.
AWEs have previously shown utility in capturing acoustic discriminability.
Our findings underscore the acoustic context conveyed by AWEs and showcase the highly competitive Speech Emotion Recognition accuracies.
arXiv Detail & Related papers (2024-02-04T21:24:54Z) - Towards leveraging LLMs for Conditional QA [1.9649272351760063]
This study delves into the capabilities and limitations of Large Language Models (LLMs) in the challenging domain of conditional question-answering.
Our findings reveal that fine-tuned LLMs can surpass the state-of-the-art (SOTA) performance in some cases, even without fully encoding all input context.
These models encounter challenges in extractive question answering, where they lag behind the SOTA by over 10 points, and in mitigating the risk of injecting false information.
arXiv Detail & Related papers (2023-12-02T14:02:52Z) - Little Giants: Exploring the Potential of Small LLMs as Evaluation
Metrics in Summarization in the Eval4NLP 2023 Shared Task [53.163534619649866]
This paper focuses on assessing the effectiveness of prompt-based techniques to empower Large Language Models to handle the task of quality estimation.
We conducted systematic experiments with various prompting techniques, including standard prompting, prompts informed by annotator instructions, and innovative chain-of-thought prompting.
Our work reveals that combining these approaches using a "small", open source model (orca_mini_v3_7B) yields competitive results.
arXiv Detail & Related papers (2023-11-01T17:44:35Z) - Towards single integrated spoofing-aware speaker verification embeddings [63.42889348690095]
This study aims to develop a single integrated spoofing-aware speaker verification embeddings.
We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data.
Experiments show dramatic improvements, achieving a SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.
arXiv Detail & Related papers (2023-05-30T14:15:39Z) - Designing Multimodal Datasets for NLP Challenges [5.874143210792986]
We identify challenges and tasks that are reflective of linguistic and cognitive competencies that humans have when speaking and reasoning.
We describe a diagnostic dataset, Recipe-to-Video Questions (R2VQ), designed for testing competence-based comprehension over a multimodal recipe collection.
arXiv Detail & Related papers (2021-05-12T23:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.