Related papers: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association

RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association

URL: http://arxiv.org/abs/2512.02860v1
Date: Tue, 02 Dec 2025 15:21:21 GMT
Title: RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
Authors: Abdul Hannan, Furqan Malik, Hina Jabbar, Syed Suleman Sadiq, Mubashir Noman,
Abstract summary: The challenge introduces English-German face-voice pairs to be utilized in the evaluation phase.<n>Our method performs favorably on the English-German data split and ranked 3rd in the FAME 2026 challenge by achieving the EER of 33.1.
Score: 0.6024251635050109
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Face-voice association in multilingual environment challenge 2026 aims to investigate the face-voice association task in multilingual scenario. The challenge introduces English-German face-voice pairs to be utilized in the evaluation phase. To this end, we revisit the fusion and orthogonal projection for face-voice association by effectively focusing on the relevant semantic information within the two modalities. Our method performs favorably on the English-German data split and ranked 3rd in the FAME 2026 challenge by achieving the EER of 33.1.

Related papers

Linking Faces and Voices Across Languages: Insights from the FAME 2026 Challenge [27.73711803720755]
The Face-Voice Association in Multilingual Environments (FAME) 2026 Challenge, held at ICASSP 2026, focuses on developing methods for face-voice association.<n>This report provides a brief summary of the challenge.
arXiv Detail & Related papers (2025-12-23T14:00:34Z)
Shared Multi-modal Embedding Space for Face-Voice Association [21.92195248206171]
The FAME 2026 challenge comprises two demanding tasks: training face-voice associations and testing on languages on which the model was not trained.<n>Our approach consists of separate uni-modal processing pipelines with general face and voice feature extraction, complemented by additional age-gender feature extraction to support prediction.<n>Our approach achieved first place in the FAME 2026 challenge, with an average Equal-Error Rate (EER) of 23.99%.
arXiv Detail & Related papers (2025-12-04T14:04:15Z)
The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties [107.57160730151975]
We construct a new test suite that consists of data from 200+ languages, accents, and dialects to evaluate SOTA multilingual speech models.<n>The best-performing submission achieved an absolute improvement in LID accuracy of 23% and a reduction in CER of 18%.<n>On accented and dialectal data, the best submission obtained 30.2% lower CER and 15.7% higher LID accuracy.
arXiv Detail & Related papers (2025-09-08T18:42:36Z)
SLRTP2025 Sign Language Production Challenge: Methodology, Results, and Future Work [87.9341538630949]
The first Sign Language Production Challenge was held as part of the third SLRTP Workshop at CVPR 2025.<n>The competition's aims are to evaluate architectures that translate from spoken language sentences to a sequence of skeleton poses.<n>This paper presents the challenge design and the winning methodologies.
arXiv Detail & Related papers (2025-08-09T11:57:33Z)
Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan [24.480174322626155]
The Face-voice Association in Multilingual Environments (FAME) 2026 Challenge focuses on exploring face-voice association under a multilingual scenario.<n>This report provides the details of the challenge, dataset, baseline models, and task details for the FAME Challenge.
arXiv Detail & Related papers (2025-08-06T16:09:47Z)
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association [24.843733099049015]
This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge. It focuses on a contrastive learning-based chaining-cluster method to enhance face-voice association. We conducted extensive experiments to investigate the impact of language on face-voice association. The results demonstrate the superior performance of our method, and we validate the robustness and effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-08-04T13:24:36Z)
Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan [29.23176868272216]
The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.
arXiv Detail & Related papers (2024-04-14T19:51:32Z)
Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023. The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark. We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z)
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond [87.4049283495551]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.<n>The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.<n>The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z)
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z)
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge [95.6159736804855]
The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022. The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
arXiv Detail & Related papers (2023-02-20T19:27:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.