RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
- URL: http://arxiv.org/abs/2512.02860v1
- Date: Tue, 02 Dec 2025 15:21:21 GMT
- Title: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
- Authors: Abdul Hannan, Furqan Malik, Hina Jabbar, Syed Suleman Sadiq, Mubashir Noman,
- Abstract summary: The challenge introduces English-German face-voice pairs to be utilized in the evaluation phase.<n>Our method performs favorably on the English-German data split and ranked 3rd in the FAME 2026 challenge by achieving the EER of 33.1.
- Score: 0.6024251635050109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Face-voice association in multilingual environment challenge 2026 aims to investigate the face-voice association task in multilingual scenario. The challenge introduces English-German face-voice pairs to be utilized in the evaluation phase. To this end, we revisit the fusion and orthogonal projection for face-voice association by effectively focusing on the relevant semantic information within the two modalities. Our method performs favorably on the English-German data split and ranked 3rd in the FAME 2026 challenge by achieving the EER of 33.1.
Related papers
- Linking Faces and Voices Across Languages: Insights from the FAME 2026 Challenge [27.73711803720755]
The Face-Voice Association in Multilingual Environments (FAME) 2026 Challenge, held at ICASSP 2026, focuses on developing methods for face-voice association.<n>This report provides a brief summary of the challenge.
arXiv Detail & Related papers (2025-12-23T14:00:34Z) - Shared Multi-modal Embedding Space for Face-Voice Association [21.92195248206171]
The FAME 2026 challenge comprises two demanding tasks: training face-voice associations and testing on languages on which the model was not trained.<n>Our approach consists of separate uni-modal processing pipelines with general face and voice feature extraction, complemented by additional age-gender feature extraction to support prediction.<n>Our approach achieved first place in the FAME 2026 challenge, with an average Equal-Error Rate (EER) of 23.99%.
arXiv Detail & Related papers (2025-12-04T14:04:15Z) - The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties [107.57160730151975]
We construct a new test suite that consists of data from 200+ languages, accents, and dialects to evaluate SOTA multilingual speech models.<n>The best-performing submission achieved an absolute improvement in LID accuracy of 23% and a reduction in CER of 18%.<n>On accented and dialectal data, the best submission obtained 30.2% lower CER and 15.7% higher LID accuracy.
arXiv Detail & Related papers (2025-09-08T18:42:36Z) - SLRTP2025 Sign Language Production Challenge: Methodology, Results, and Future Work [87.9341538630949]
The first Sign Language Production Challenge was held as part of the third SLRTP Workshop at CVPR 2025.<n>The competition's aims are to evaluate architectures that translate from spoken language sentences to a sequence of skeleton poses.<n>This paper presents the challenge design and the winning methodologies.
arXiv Detail & Related papers (2025-08-09T11:57:33Z) - Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan [24.480174322626155]
The Face-voice Association in Multilingual Environments (FAME) 2026 Challenge focuses on exploring face-voice association under a multilingual scenario.<n>This report provides the details of the challenge, dataset, baseline models, and task details for the FAME Challenge.
arXiv Detail & Related papers (2025-08-06T16:09:47Z) - Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association [24.843733099049015]
This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge.
It focuses on a contrastive learning-based chaining-cluster method to enhance face-voice association.
We conducted extensive experiments to investigate the impact of language on face-voice association.
The results demonstrate the superior performance of our method, and we validate the robustness and effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-08-04T13:24:36Z) - Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan [29.23176868272216]
The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.
This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.
arXiv Detail & Related papers (2024-04-14T19:51:32Z) - Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023.
The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark.
We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond [87.4049283495551]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.<n>The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.<n>The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge [95.6159736804855]
The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022.
The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
arXiv Detail & Related papers (2023-02-20T19:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.