RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
- URL: http://arxiv.org/abs/2406.04906v1
- Date: Fri, 7 Jun 2024 12:58:14 GMT
- Title: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
- Authors: Liting Huang, Zhihao Zhang, Yiran Zhang, Xiyue Zhou, Shoujin Wang,
- Abstract summary: We introduce RU-AI, a new large-scale multimodal dataset for detecting machine-generated content in text, image, and voice.
Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205.
Our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data.
- Score: 11.265512559447986
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI.
Related papers
- SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs [6.879945062426145]
We generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs.
We demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation.
arXiv Detail & Related papers (2024-06-28T01:14:43Z) - MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion [25.44529512862336]
MASSTAR is a multi-modal lArge-scale scene dataset with a verSatile Toolchain for surfAce pRediction and completion.
We develop a versatile and efficient toolchain for processing the raw 3D data from the environments.
We generate an example dataset composed of over a thousand scene-level models with partial real-world data.
arXiv Detail & Related papers (2024-03-18T11:35:18Z) - MC-DBN: A Deep Belief Network-Based Model for Modality Completion [3.7020486533725605]
We propose a Modality Completion Deep Belief Network-Based Model (MC-DBN)
This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data.
It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model.
arXiv Detail & Related papers (2024-02-15T08:21:50Z) - IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based
Human Activity Recognition [0.19791587637442667]
Cross modality transfer approaches convert existing datasets from a source modality, such as video, to a target modality (IMU)
We introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios.
We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%.
arXiv Detail & Related papers (2024-02-01T22:37:33Z) - DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content.
However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation?
We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Exploiting the Potential of Datasets: A Data-Centric Approach for Model
Robustness [48.70325679650579]
We propose a novel algorithm for dataset enhancement that works well for many existing deep neural networks.
In the data-centric robust learning competition hosted by Alibaba Group and Tsinghua University, our algorithm came third out of more than 3000 competitors.
arXiv Detail & Related papers (2022-03-10T12:16:32Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.