LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data
- URL: http://arxiv.org/abs/2406.09864v2
- Date: Tue, 01 Oct 2024 13:07:02 GMT
- Title: LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data
- Authors: Grigor Bezirganyan, Sana Sellami, Laure Berti-Équille, Sébastien Fournier,
- Abstract summary: Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos.
To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models.
We propose LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data.
- Score: 3.66486428341988
- License:
- Abstract: Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos. To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models. We propose LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data. It extends the well-known CIFAR 10/100 dataset with audio samples extracted from three audio corpora, and text data generated using the Gemma-7B Large Language Model (LLM). The LUMA dataset enables the controlled injection of varying types and degrees of uncertainty to achieve and tailor specific experiments and benchmarking initiatives. LUMA is also available as a Python package including the functions for generating multiple variants of the dataset with controlling the diversity of the data, the amount of noise for each modality, and adding out-of-distribution samples. A baseline pre-trained model is also provided alongside three uncertainty quantification methods: Monte-Carlo Dropout, Deep Ensemble, and Reliable Conflictive Multi-View Learning. This comprehensive dataset and its benchmarking tools are intended to promote and support the development, evaluation, and benchmarking of trustworthy and robust multimodal deep learning approaches. We anticipate that the LUMA dataset will help the ICLR community to design more trustworthy and robust machine learning approaches for safety critical applications.
Related papers
- Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications.
The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard.
We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z) - SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.
It is designed to accurately detect horizontal or oriented objects from any sensor modality.
This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.
Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.
We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep
Learning [110.54752872873472]
MultiZoo is a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms.
MultiBench is a benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
arXiv Detail & Related papers (2023-06-28T17:59:10Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning [8.868945335907867]
We propose a deep modal shared information learning module to capture the shared information between modalities.
We also use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities.
Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.
arXiv Detail & Related papers (2023-05-15T09:24:48Z) - Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning
with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client.
We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling.
Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z) - Leveraging Uni-Modal Self-Supervised Learning for Multimodal
Audio-Visual Speech Recognition [23.239078852797817]
We leverage uni-modal self-supervised learning to promote the multimodal audio-visual speech recognition (AVSR)
In particular, we first train audio and visual encoders on a large-scale uni-modal dataset, then we integrate components of both encoders into a larger multimodal framework.
Our model is experimentally validated on both word-level and sentence-level AVSR tasks.
arXiv Detail & Related papers (2022-02-24T15:12:17Z) - Learning from Multiple Noisy Augmented Data Sets for Better
Cross-Lingual Spoken Language Understanding [69.40915115518523]
Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages.
Various data augmentation approaches have been proposed to synthesize training data in low-resource target languages.
In this paper we focus on mitigating noise in augmented data.
arXiv Detail & Related papers (2021-09-03T15:44:15Z) - Evaluation Framework For Large-scale Federated Learning [10.127616622630514]
Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model.
In this paper, we introduce a framework designed for large-scale federated learning which consists of approaches to generating dataset and modular evaluation framework.
arXiv Detail & Related papers (2020-03-03T15:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.