Related papers: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

URL: http://arxiv.org/abs/2406.04906v1
Date: Fri, 7 Jun 2024 12:58:14 GMT
Title: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
Authors: Liting Huang, Zhihao Zhang, Yiran Zhang, Xiyue Zhou, Shoujin Wang,
Abstract summary: We introduce RU-AI, a new large-scale multimodal dataset for detecting machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205. Our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data.
Score: 11.265512559447986
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI.

Related papers

Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged. However, the quality of such detectors tends to drop dramatically in the wild. We propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z)
Continual Learning for Multimodal Data Fusion of a Soft Gripper [1.0589208420411014]
A model trained on one data modality often fails when tested with a different modality. We introduce a continual learning algorithm capable of incrementally learning different data modalities. We evaluate the algorithm's effectiveness on a challenging custom multimodal dataset.
arXiv Detail & Related papers (2024-09-20T09:53:27Z)
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs [6.879945062426145]
We generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs. We demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation.
arXiv Detail & Related papers (2024-06-28T01:14:43Z)
Improving Interpretability and Robustness for the Detection of AI-Generated Images [6.116075037154215]
We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings. We show how to interpret them, shedding light on how images produced by various AI generators differ from real ones.
arXiv Detail & Related papers (2024-06-21T10:33:09Z)
MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion [25.44529512862336]
MASSTAR is a multi-modal lArge-scale scene dataset with a verSatile Toolchain for surfAce pRediction and completion. We develop a versatile and efficient toolchain for processing the raw 3D data from the environments. We generate an example dataset composed of over a thousand scene-level models with partial real-world data.
arXiv Detail & Related papers (2024-03-18T11:35:18Z)
MC-DBN: A Deep Belief Network-Based Model for Modality Completion [3.7020486533725605]
We propose a Modality Completion Deep Belief Network-Based Model (MC-DBN) This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data. It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model.
arXiv Detail & Related papers (2024-02-15T08:21:50Z)
IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition [0.19791587637442667]
Cross modality transfer approaches convert existing datasets from a source modality, such as video, to a target modality (IMU) We introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios. We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%.
arXiv Detail & Related papers (2024-02-01T22:37:33Z)
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z)
Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z)
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC) The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z)
Exploiting the Potential of Datasets: A Data-Centric Approach for Model Robustness [48.70325679650579]
We propose a novel algorithm for dataset enhancement that works well for many existing deep neural networks. In the data-centric robust learning competition hosted by Alibaba Group and Tsinghua University, our algorithm came third out of more than 3000 competitors.
arXiv Detail & Related papers (2022-03-10T12:16:32Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps. Our dataset is collected in both forms of 2D images and 3D point clouds. Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
Contextual-Bandit Anomaly Detection for IoT Data in Distributed Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay. We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems. We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.