Related papers: LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data

LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data

URL: http://arxiv.org/abs/2406.09864v1
Date: Fri, 14 Jun 2024 09:22:07 GMT
Title: LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data
Authors: Grigor Bezirganyan, Sana Sellami, Laure Berti-Équille, Sébastien Fournier,
Abstract summary: Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos. To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models. We introduce LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data.
Score: 3.66486428341988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos. To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models. We introduce LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data. It extends the well-known CIFAR 10/100 dataset with audio samples extracted from three audio corpora, and text data generated using the Gemma-7B Large Language Model (LLM). The LUMA dataset enables the controlled injection of varying types and degrees of uncertainty to achieve and tailor specific experiments and benchmarking initiatives. LUMA is also available as a Python package including the functions for generating multiple variants of the dataset with controlling the diversity of the data, the amount of noise for each modality, and adding out-of-distribution samples. A baseline pre-trained model is also provided alongside three uncertainty quantification methods: Monte-Carlo Dropout, Deep Ensemble, and Reliable Conflictive Multi-View Learning. This comprehensive dataset and its tools are intended to promote and support the development and benchmarking of trustworthy and robust multimodal deep learning approaches.

Related papers

Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing. It is designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales. We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines [63.22096609916707]
Multi-modal Retrieval Augmented Multi-modal Generation (M$2$RAG) is a novel task that enables foundation models to process multi-modal web content.<n>Despite its potential impact, M$2$RAG remains understudied, lacking comprehensive analysis and high-quality data resources.
arXiv Detail & Related papers (2024-11-25T13:20:19Z)
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection. We introduce multi-stage prompting modules for multi-dataset 3D detection. Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z)
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models [48.484485609995986]
Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM) There are currently no realistic datasets and benchmarks for FedLLM. We propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics.
arXiv Detail & Related papers (2024-06-07T11:19:30Z)
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models. Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning [110.54752872873472]
MultiZoo is a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms. MultiBench is a benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
arXiv Detail & Related papers (2023-06-28T17:59:10Z)
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization. infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z)
Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning [8.868945335907867]
We propose a deep modal shared information learning module to capture the shared information between modalities. We also use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities. Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.
arXiv Detail & Related papers (2023-05-15T09:24:48Z)
Self-Supervised Multimodal Learning: A Survey [23.526389924804207]
Multimodal learning aims to understand and analyze information from multiple modalities. The heavy dependence on data paired with expensive human annotations impedes scaling up models. Given the availability of large-scale unannotated data in the wild, self-supervised learning has become an attractive strategy to alleviate the annotation bottleneck.
arXiv Detail & Related papers (2023-03-31T16:11:56Z)
Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client. We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling. Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z)
Leveraging Uni-Modal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition [23.239078852797817]
We leverage uni-modal self-supervised learning to promote the multimodal audio-visual speech recognition (AVSR) In particular, we first train audio and visual encoders on a large-scale uni-modal dataset, then we integrate components of both encoders into a larger multimodal framework. Our model is experimentally validated on both word-level and sentence-level AVSR tasks.
arXiv Detail & Related papers (2022-02-24T15:12:17Z)
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding [69.40915115518523]
Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Various data augmentation approaches have been proposed to synthesize training data in low-resource target languages. In this paper we focus on mitigating noise in augmented data.
arXiv Detail & Related papers (2021-09-03T15:44:15Z)
Evaluation Framework For Large-scale Federated Learning [10.127616622630514]
Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model. In this paper, we introduce a framework designed for large-scale federated learning which consists of approaches to generating dataset and modular evaluation framework.
arXiv Detail & Related papers (2020-03-03T15:12:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.