The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset:
Collection, Insights and Improvements
- URL: http://arxiv.org/abs/2101.06053v1
- Date: Fri, 15 Jan 2021 10:40:37 GMT
- Title: The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset:
Collection, Insights and Improvements
- Authors: Lukas Stappen, Alice Baird, Lea Schumann, Bj\"orn Schuller
- Abstract summary: We present MuSe-CaR, a first of its kind multimodal dataset.
The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge.
- Score: 14.707930573950787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Truly real-life data presents a strong, but exciting challenge for sentiment
and emotion research. The high variety of possible `in-the-wild' properties
makes large datasets such as these indispensable with respect to building
robust machine learning models. A sufficient quantity of data covering a deep
variety in the challenges of each modality to force the exploratory analysis of
the interplay of all modalities has not yet been made available in this
context. In this contribution, we present MuSe-CaR, a first of its kind
multimodal dataset. The data is publicly available as it recently served as the
testing bed for the 1st Multimodal Sentiment Analysis Challenge, and focused on
the tasks of emotion, emotion-target engagement, and trustworthiness
recognition by means of comprehensively integrating the audio-visual and
language modalities. Furthermore, we give a thorough overview of the dataset in
terms of collection and annotation, including annotation tiers not used in this
year's MuSe 2020. In addition, for one of the sub-challenges - predicting the
level of trustworthiness - no participant outperformed the baseline model, and
so we propose a simple, but highly efficient Multi-Head-Attention network that
exceeds using multimodal fusion the baseline by around 0.2 CCC (almost 50 %
improvement).
Related papers
- Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - Multimodal Fusion on Low-quality Data: A Comprehensive Survey [110.22752954128738]
This paper surveys the common challenges and recent advances of multimodal fusion in the wild.
We identify four main challenges that are faced by multimodal fusion on low-quality data.
This new taxonomy will enable researchers to understand the state of the field and identify several potential directions.
arXiv Detail & Related papers (2024-04-27T07:22:28Z) - Sequential Compositional Generalization in Multimodal Models [23.52949473093583]
We conduct a comprehensive assessment of several unimodal and multimodal models.
Our findings reveal that bi-modal and tri-modal models exhibit a clear edge over their text-only counterparts.
arXiv Detail & Related papers (2024-04-18T09:04:15Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
Reasoning [19.43430577960824]
This paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance.
Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios.
arXiv Detail & Related papers (2023-09-12T20:51:07Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality.
We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z) - MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep
Learning [110.54752872873472]
MultiZoo is a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms.
MultiBench is a benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
arXiv Detail & Related papers (2023-06-28T17:59:10Z) - SGED: A Benchmark dataset for Performance Evaluation of Spiking Gesture
Emotion Recognition [12.396844568607522]
We label a new homogeneous multimodal gesture emotion recognition dataset based on the analysis of the existing data sets.
We propose a pseudo dual-flow network based on this dataset, and verify the application potential of this dataset in the affective computing community.
arXiv Detail & Related papers (2023-04-28T09:32:09Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.