Related papers: Multilingual Audio-Visual Smartphone Dataset And Evaluation

Multilingual Audio-Visual Smartphone Dataset And Evaluation

URL: http://arxiv.org/abs/2109.04138v1
Date: Thu, 9 Sep 2021 09:52:37 GMT
Title: Multilingual Audio-Visual Smartphone Dataset And Evaluation
Authors: Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch
Abstract summary: We present an audio-visual smartphone dataset captured in five different recent smartphones. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset.
Score: 35.82191448400655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to the usability and also it will be challenging to spoof because of multi-modal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones.

Related papers

Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision [50.23246260804145]
We introduce textbfNexus-O, an industry-level textbfomni-perceptive and -interactive model capable of efficiently processing Audio, Image, Video, and Text data. We address three key research questions: First, how can models be efficiently designed and trained to achieve tri-modal alignment, understanding and reasoning capabilities across multiple modalities? Second, what approaches can be implemented to evaluate tri-modal model robustness, ensuring reliable performance and applicability in real-world scenarios? Third, what strategies can be employed to curate and obtain high-quality, real-life scenario
arXiv Detail & Related papers (2025-02-26T17:26:36Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Developing Acoustic Models for Automatic Speech Recognition in Swedish [6.5458610824731664]
This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish.
arXiv Detail & Related papers (2024-04-25T12:03:14Z)
Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems [7.207019635697126]
This article aims to determine which and where information is located in an automatic speech recognition acoustic model (AM) Experiments are performed on speaker verification, acoustic environment classification, gender classification, tempo-distortion detection systems and speech sentiment/emotion identification. Analysis showed that neural-based AMs hold heterogeneous information that seems surprisingly uncorrelated with phoneme recognition.
arXiv Detail & Related papers (2024-02-29T18:43:53Z)
Language identification as improvement for lip-based biometric visual systems [13.205817167773443]
We present a preliminary study in which we use linguistic information as a soft biometric trait to enhance the performance of a visual (auditory-free) identification system based on lip movement. We report a significant improvement in the identification performance of the proposed visual system as a result of the integration of these data.
arXiv Detail & Related papers (2023-02-27T15:44:24Z)
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition [100.30565531246165]
Speech recognition systems require dataset-specific tuning. This tuning requirement can lead to systems failing to generalise to other datasets and domains. We introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition system.
arXiv Detail & Related papers (2022-10-24T15:58:48Z)
Mobile Behavioral Biometrics for Passive Authentication [65.94403066225384]
This work carries out a comparative analysis of unimodal and multimodal behavioral biometric traits. Experiments are performed over HuMIdb, one of the largest and most comprehensive freely available mobile user interaction databases. In our experiments, the most discriminative background sensor is the magnetometer, whereas among touch tasks the best results are achieved with keystroke.
arXiv Detail & Related papers (2022-03-14T17:05:59Z)
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language. We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z)
Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments [7.286387368812729]
This paper presents a publicly available, phonetically transcribed corpus of 2255 utterances in the endangered Tangkhulic language East Tusom. Because the dataset is transcribed in terms of phones, rather than phonemes, it is a better match for universal phone recognition systems than many larger datasets.
arXiv Detail & Related papers (2021-04-02T00:26:10Z)
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification. We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z)
Few Shot Text-Independent speaker verification using 3D-CNN [0.0]
We have proposed a novel method to verify the identity of the claimed speaker using very few training data. Experiments conducted on the VoxCeleb1 dataset show that the proposed model accuracy even on training with very few data is near to the state of the art model on text-independent speaker verification.
arXiv Detail & Related papers (2020-08-25T15:03:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.