The ACM Multimedia 2022 Computational Paralinguistics Challenge:
Vocalisations, Stuttering, Activity, & Mosquitoes
- URL: http://arxiv.org/abs/2205.06799v1
- Date: Fri, 13 May 2022 17:51:45 GMT
- Title: The ACM Multimedia 2022 Computational Paralinguistics Challenge:
Vocalisations, Stuttering, Activity, & Mosquitoes
- Authors: Bj\"orn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian
Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P.
Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry
Coppock, Ivan Kiskin, Marianne Sinka, Stephen Roberts
- Abstract summary: ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems.
In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made.
The Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data.
In the Mosquitoes Sub-Challenge, mosquitoes need to be detected.
- Score: 9.09787422797708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses
four different problems for the first time in a research competition under
well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a
classification on human non-verbal vocalisations and speech has to be made; the
Activity Sub-Challenge aims at beyond-audio human activity recognition from
smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to
be detected. We describe the Sub-Challenges, baseline feature extraction, and
classifiers based on the usual ComPaRE and BoAW features, the auDeep toolkit,
and deep feature extraction from pre-trained CNNs using the DeepSpectRum
toolkit; in addition, we add end-to-end sequential modelling, and a
log-mel-128-BNN.
Related papers
- The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition [64.5207572897806]
The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems.
In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals.
The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor dataset.
arXiv Detail & Related papers (2024-06-11T22:26:20Z) - AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension [95.8442896569132]
We introduce AIR-Bench, the first benchmark to evaluate the ability of Large Audio-Language Models (LALMs) to understand various types of audio signals and interact with humans in the textual format.
Results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation.
arXiv Detail & Related papers (2024-02-12T15:41:22Z) - First Place Solution to the CVPR'2023 AQTC Challenge: A
Function-Interaction Centric Approach with Spatiotemporal Visual-Language
Alignment [15.99008977852437]
Affordance-Centric Question-driven Task Completion (AQTC) has been proposed to acquire from videos to users with comprehensive and systematic instructions.
Existing methods have neglected the necessity of aligning visual and linguistic signals, as well as the crucial interactional information between humans objects.
We propose to combine largescale pre-trained vision- and video-language models, which serve to contribute stable and reliable multimodal data.
arXiv Detail & Related papers (2023-06-23T09:02:25Z) - The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked
Emotions, Cross-Cultural Humour, and Personalisation [69.13075715686622]
MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems.
MuSe 2023 seeks to bring together a broad audience from different research communities.
arXiv Detail & Related papers (2023-05-05T08:53:57Z) - The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion
Share & Requests [66.24715220997547]
The ACM Multimedia 2023 Paralinguistics Challenge addresses two different problems for the first time under well-defined conditions.
In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected.
We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit.
arXiv Detail & Related papers (2023-04-28T14:42:55Z) - Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion,
Age, and Origin from Vocal Bursts [49.31604138034298]
Burst2Vec uses pre-trained speech representations to capture acoustic information from raw waveforms.
Our models achieve a relative 30 % performance gain over baselines using pre-extracted features.
arXiv Detail & Related papers (2022-06-24T18:57:41Z) - Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention [2.8017924048352576]
We propose a simple yet efficient neural network architecture to exploit both acoustic and lexical informationfrom speech.
The proposed framework using multi-scale con-volutional layers (MSCNN) to obtain both audio and text hid-den representations.
Extensive experiments show that the proposed modeloutperforms previous state-of-the-art methods on IEMOCAPdataset.
arXiv Detail & Related papers (2021-06-08T06:45:42Z) - The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19
Cough, COVID-19 Speech, Escalation & Primates [34.39118619224786]
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time.
In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech.
In the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured.
arXiv Detail & Related papers (2021-02-24T21:39:59Z) - The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning
with Keywords and Sentence Length Estimation [49.41766997393417]
This report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6.
Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy.
We simultaneously solve the main caption generation and sub indeterminacy problems by estimating keywords and sentence length through multi-task learning.
arXiv Detail & Related papers (2020-07-01T04:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.