They are wearing a mask! Identification of Subjects Wearing a Surgical
Mask from their Speech by means of x-vectors and Fisher Vectors
- URL: http://arxiv.org/abs/2008.10014v1
- Date: Sun, 23 Aug 2020 11:27:11 GMT
- Title: They are wearing a mask! Identification of Subjects Wearing a Surgical
Mask from their Speech by means of x-vectors and Fisher Vectors
- Authors: Jos\'e Vicente Egas-L\'opez
- Abstract summary: The INTERSPEECH 2020 Computational Paralinguistics Challenge offers three different problems.
This challenge involves the classification of speech recorded from subjects while wearing a surgical mask.
In this study, to address the above-mentioned problem we employ two different types of feature extraction methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Challenges based on Computational Paralinguistics in the INTERSPEECH
Conference have always had a good reception among the attendees owing to its
competitive academic and research demands. This year, the INTERSPEECH 2020
Computational Paralinguistics Challenge offers three different problems; here,
the Mask Sub-Challenge is of specific interest. This challenge involves the
classification of speech recorded from subjects while wearing a surgical mask.
In this study, to address the above-mentioned problem we employ two different
types of feature extraction methods. The x-vectors embeddings, which is the
current state-of-the-art approach for Speaker Recognition; and the Fisher
Vector (FV), that is a method originally intended for Image Recognition, but
here we utilize it to discriminate utterances. These approaches employ distinct
frame-level representations: MFCC and PLP. Using Support Vector Machines (SVM)
as the classifier, we perform a technical comparison between the performances
of the FV encodings and the x-vector embeddings for this particular
classification task. We find that the Fisher vector encodings provide better
representations of the utterances than the x-vectors do for this specific
dataset. Moreover, we show that a fusion of our best configurations outperforms
all the baseline scores of the Mask Sub-Challenge.
Related papers
- Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users.
A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models.
Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z) - Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - Masked Face Recognition with Generative-to-Discriminative Representations [29.035270415311427]
We propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition.
First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors.
We incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors.
arXiv Detail & Related papers (2024-05-27T02:20:55Z) - Ablation Study to Clarify the Mechanism of Object Segmentation in
Multi-Object Representation Learning [3.921076451326107]
Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects.
It is not clear how previous methods have achieved the appropriate segmentation of individual objects.
Most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE)
arXiv Detail & Related papers (2023-10-05T02:59:48Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z) - What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - PFENet++: Boosting Few-shot Semantic Segmentation with the
Noise-filtered Context-aware Prior Mask [62.37727055343632]
We revisit the prior mask guidance proposed in Guided Feature Enrichment Network for Few-Shot''
We propose the Context-aware Prior Mask (CAPM) that leverages additional nearby semantic cues for better locating the objects in query images.
We take one step further by incorporating a lightweight Noise Suppression Module (NSM) to screen out the unnecessary responses.
arXiv Detail & Related papers (2021-09-28T15:07:43Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - Are you wearing a mask? Improving mask detection from speech using
augmentation by cycle-consistent GANs [24.182791316595576]
We propose a novel data augmentation approach for mask detection from speech.
Our approach is based on (i) training Geneversarative Adrial Networks (GANs) with cycle-consistency loss to translate unpaired utterances.
We show that our data augmentation approach yields better results than other baseline and state-of-the-art augmentation methods.
arXiv Detail & Related papers (2020-06-17T20:46:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.