Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association
- URL: http://arxiv.org/abs/2408.02025v2
- Date: Mon, 19 Aug 2024 05:14:53 GMT
- Title: Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association
- Authors: Wuyang Chen, Yanjie Sun, Kele Xu, Yong Dou,
- Abstract summary: This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge.
It focuses on a contrastive learning-based chaining-cluster method to enhance face-voice association.
We conducted extensive experiments to investigate the impact of language on face-voice association.
The results demonstrate the superior performance of our method, and we validate the robustness and effectiveness of our proposed approach.
- Score: 24.843733099049015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The innate correlation between a person's face and voice has recently emerged as a compelling area of study, especially within the context of multilingual environments. This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge, focusing on a contrastive learning-based chaining-cluster method to enhance face-voice association. This task involves the challenges of building biometric relations between auditory and visual modality cues and modelling the prosody interdependence between different languages while addressing both intrinsic and extrinsic variability present in the data. To handle these non-trivial challenges, our method employs supervised cross-contrastive (SCC) learning to establish robust associations between voices and faces in multi-language scenarios. Following this, we have specifically designed a chaining-cluster-based post-processing step to mitigate the impact of outliers often found in unconstrained in the wild data. We conducted extensive experiments to investigate the impact of language on face-voice association. The overall results were evaluated on the FAME public evaluation platform, where we achieved 2nd place. The results demonstrate the superior performance of our method, and we validate the robustness and effectiveness of our proposed approach. Code is available at https://github.com/colaudiolab/FAME24_solution.
Related papers
- CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment [38.35458193262633]
English-centric models are usually suboptimal in other languages.
We propose a novel approach called CrossIn, which utilizes a mixed composition of cross-lingual instruction tuning data.
arXiv Detail & Related papers (2024-04-18T06:20:50Z) - Graph-based Clustering for Detecting Semantic Change Across Time and
Languages [10.058655884092094]
We propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages.
Our approach substantially surpasses previous approaches in the SemEval 2020 binary classification task across four languages.
arXiv Detail & Related papers (2024-02-01T21:27:19Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - Cross-Platform and Cross-Domain Abusive Language Detection with
Supervised Contrastive Learning [14.93845721221461]
We design SCL-Fish, a supervised contrastive learning integrated meta-learning algorithm to detect abusive language on unseen platforms.
Our experimental analysis shows that SCL-Fish achieves better performance over ERM and the existing state-of-the-art models.
arXiv Detail & Related papers (2022-11-11T19:22:36Z) - Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment [63.0407314271459]
The proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
arXiv Detail & Related papers (2022-10-09T02:24:35Z) - Learning Branched Fusion and Orthogonal Projection for Face-Voice
Association [20.973188176888865]
We propose a light-weight, plug-and-play mechanism that exploits the complementary cues in both modalities to form enriched fused embeddings.
Results reveal that our method performs favourably against the current state-of-the-art methods.
In addition, we leverage cross-modal verification and matching tasks to analyze the impact of multiple languages on face-voice association.
arXiv Detail & Related papers (2022-08-22T12:23:09Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Improving Neural Cross-Lingual Summarization via Employing Optimal
Transport Distance for Knowledge Distillation [8.718749742587857]
Cross-lingual summarization models rely on the self-attention mechanism to attend among tokens in two languages.
We propose a novel Knowledge-Distillation-based framework for Cross-Lingual Summarization.
Our method outperforms state-of-the-art models under both high and low-resourced settings.
arXiv Detail & Related papers (2021-12-07T03:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.