MuG: A Multimodal Classification Benchmark on Game Data with Tabular,
Textual, and Visual Fields
- URL: http://arxiv.org/abs/2302.02978v2
- Date: Tue, 17 Oct 2023 16:03:38 GMT
- Title: MuG: A Multimodal Classification Benchmark on Game Data with Tabular,
Textual, and Visual Fields
- Authors: Jiaying Lu, Yongchen Qian, Shifan Zhao, Yuanzhe Xi, Carl Yang
- Abstract summary: We propose a multimodal classification benchmark MuG with eight datasets that allows researchers to evaluate and improve their models.
We conduct multi-aspect data analysis to provide insights into the benchmark, including label balance ratios, percentages of missing features, distributions of data within each modality, and the correlations between labels and input modalities.
- Score: 26.450463943664822
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Previous research has demonstrated the advantages of integrating data from
multiple sources over traditional unimodal data, leading to the emergence of
numerous novel multimodal applications. We propose a multimodal classification
benchmark MuG with eight datasets that allows researchers to evaluate and
improve their models. These datasets are collected from four various genres of
games that cover tabular, textual, and visual modalities. We conduct
multi-aspect data analysis to provide insights into the benchmark, including
label balance ratios, percentages of missing features, distributions of data
within each modality, and the correlations between labels and input modalities.
We further present experimental results obtained by several state-of-the-art
unimodal classifiers and multimodal classifiers, which demonstrate the
challenging and multimodal-dependent properties of the benchmark. MuG is
released at https://github.com/lujiaying/MUG-Bench with the data, tutorials,
and implemented baselines.
Related papers
- When Graph meets Multimodal: Benchmarking on Multimodal Attributed Graphs Learning [36.6581535146878]
Multimodal attributed graphs (MAGs) are prevalent in various real-world scenarios and generally contain two kinds of knowledge.
Recent advancements in Pre-trained Language/Vision models (PLMs/PVMs) and Graph neural networks (GNNs) have facilitated effective learning on MAGs.
We propose Multimodal Attribute Graph Benchmark (MAGB), a comprehensive and diverse collection of challenging benchmark datasets for MAGs.
arXiv Detail & Related papers (2024-10-11T13:24:57Z) - BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning
with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client.
We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling.
Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit [6.187270874122921]
We propose a toolkit for systematic multimodal VAE training and comparison.
We present a disentangled bimodal dataset designed to comprehensively evaluate the joint generation and cross-generation capabilities.
arXiv Detail & Related papers (2022-09-07T10:26:28Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset:
Collection, Insights and Improvements [14.707930573950787]
We present MuSe-CaR, a first of its kind multimodal dataset.
The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge.
arXiv Detail & Related papers (2021-01-15T10:40:37Z) - MELINDA: A Multimodal Dataset for Biomedical Experiment Method
Classification [14.820951153262685]
We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification.
The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database.
We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs.
arXiv Detail & Related papers (2020-12-16T19:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.