The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion
Share & Requests
- URL: http://arxiv.org/abs/2304.14882v2
- Date: Mon, 1 May 2023 07:59:34 GMT
- Title: The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion
Share & Requests
- Authors: Bj\"orn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander
Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird,
Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic,
Marie-Jos\'e Caraty, Claude Montaci\'e
- Abstract summary: The ACM Multimedia 2023 Paralinguistics Challenge addresses two different problems for the first time under well-defined conditions.
In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected.
We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit.
- Score: 66.24715220997547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two
different problems for the first time in a research competition under
well-defined conditions: In the Emotion Share Sub-Challenge, a regression on
speech has to be made; and in the Requests Sub-Challenges, requests and
complaints need to be detected. We describe the Sub-Challenges, baseline
feature extraction, and classifiers based on the usual ComPaRE features, the
auDeep toolkit, and deep feature extraction from pre-trained CNNs using the
DeepSpectRum toolkit; in addition, wav2vec2 models are used.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition [64.5207572897806]
The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems.
In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals.
The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor dataset.
arXiv Detail & Related papers (2024-06-11T22:26:20Z) - Hierarchical Audio-Visual Information Fusion with Multi-label Joint
Decoding for MER 2023 [51.95161901441527]
In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions.
Deep features extracted from foundation models are used as robust acoustic and visual representations of raw video.
Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge.
arXiv Detail & Related papers (2023-09-11T03:19:10Z) - Cascaded Cross-Modal Transformer for Request and Complaint Detection [31.359578768463752]
We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations.
Our approach leverages a multimodal paradigm by transcribing the speech using automatic speech recognition (ASR) models and translating the transcripts into different languages.
We apply our system to the Requests Sub-Challenge of the ACM Multimedia Computational 2023 Paralinguistics Challenge, reaching unweighted average recalls (UAR) of 65.41% and 85.87% for the complaint and request classes, respectively.
arXiv Detail & Related papers (2023-07-27T13:45:42Z) - 2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty
Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection [10.682758791557436]
This report introduces the winning solution of the team Segment Any Anomaly for the CVPR2023 Visual Anomaly and Novelty Detection (VAND) challenge.
We present a novel framework, i.e., Segment Any Anomaly + (SAA$+$), for zero-shot anomaly segmentation with multi-modal prompts.
We will release the code of our winning solution for the CVPR2023 VAN.
arXiv Detail & Related papers (2023-06-15T11:49:44Z) - The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked
Emotions, Cross-Cultural Humour, and Personalisation [69.13075715686622]
MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems.
MuSe 2023 seeks to bring together a broad audience from different research communities.
arXiv Detail & Related papers (2023-05-05T08:53:57Z) - Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion,
Age, and Origin from Vocal Bursts [49.31604138034298]
Burst2Vec uses pre-trained speech representations to capture acoustic information from raw waveforms.
Our models achieve a relative 30 % performance gain over baselines using pre-extracted features.
arXiv Detail & Related papers (2022-06-24T18:57:41Z) - The ACM Multimedia 2022 Computational Paralinguistics Challenge:
Vocalisations, Stuttering, Activity, & Mosquitoes [9.09787422797708]
ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems.
In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made.
The Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data.
In the Mosquitoes Sub-Challenge, mosquitoes need to be detected.
arXiv Detail & Related papers (2022-05-13T17:51:45Z) - The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19
Cough, COVID-19 Speech, Escalation & Primates [34.39118619224786]
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time.
In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech.
In the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured.
arXiv Detail & Related papers (2021-02-24T21:39:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.