Related papers: Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction

Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction

URL: http://arxiv.org/abs/2312.03025v1
Date: Tue, 5 Dec 2023 08:11:34 GMT
Title: Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Authors: Zilin Du, Haoxin Li, Xu Guo, Boyang Li
Abstract summary: In this paper, we consider a novel problem setting, where only unimodal data, either text or image, are available during training. We aim to train a multimodal relation from synthetic data that perform well on real multimodal test data. Our best model trained on completely synthetic images outperforms prior state-of-the-art models trained on real multimodal data by a margin of 3.76% in F1.
Score: 8.038421100401132
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The task of multimodal relation extraction has attracted significant research attention, but progress is constrained by the scarcity of available training data. One natural thought is to extend existing datasets with cross-modal generative models. In this paper, we consider a novel problem setting, where only unimodal data, either text or image, are available during training. We aim to train a multimodal classifier from synthetic data that perform well on real multimodal test data. However, training with synthetic data suffers from two obstacles: lack of data diversity and label information loss. To alleviate the issues, we propose Mutual Information-aware Multimodal Iterated Relational dAta GEneration (MI2RAGE), which applies Chained Cross-modal Generation (CCG) to promote diversity in the generated data and exploits a teacher network to select valuable training samples with high mutual information with the ground-truth labels. Comparing our method to direct training on synthetic data, we observed a significant improvement of 24.06% F1 with synthetic text and 26.42% F1 with synthetic images. Notably, our best model trained on completely synthetic images outperforms prior state-of-the-art models trained on real multimodal data by a margin of 3.76% in F1. Our codebase will be made available upon acceptance.

Related papers

Multi-Modal Dataset Distillation in the Wild [75.64263877043615]
We propose Multi-modal dataset Distillation in the Wild, i.e., MDW, to distill noisy multi-modal datasets into compact clean ones for effective and efficient model training.<n>Specifically, MDW introduces learnable fine-grained correspondences during distillation and adaptively optimize distilled data to emphasize correspondence-discriminative regions.<n>Extensive experiments validate MDW's theoretical and empirical efficacy with remarkable scalability, surpassing prior methods by over 15% across various compression ratios.
arXiv Detail & Related papers (2025-06-02T12:18:20Z)
Multimodal Federated Learning With Missing Modalities through Feature Imputation Network [9.384737026881504]
Multimodal federated learning holds immense potential for collaboratively training models from multiple sources without sharing raw data.<n>Previous methods typically rely on publicly available real datasets or synthetic data to compensate for missing modalities.<n>We propose a novel, lightweight, low-dimensional feature translator to reconstruct bottleneck features of the missing modalities.
arXiv Detail & Related papers (2025-05-26T17:11:03Z)
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation [11.08205028521878]
Aug2Search is an EBR-based framework leveraging synthetic data generated by Generative AI (GenAI) models.<n>This paper investigates the capabilities of GenAI, particularly Large Language Models (LLMs), in generating high-quality synthetic data.<n>Aug2Search achieves an improvement of up to 4% in ROC_AUC with 100 million synthetic data samples.
arXiv Detail & Related papers (2025-05-21T22:33:40Z)
Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z)
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data [71.352883755806]
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space. However, the limited labeled multimodal data often hinders embedding performance. Recent approaches have leveraged data synthesis to address this problem, yet the quality of synthetic data remains a critical bottleneck.
arXiv Detail & Related papers (2025-02-12T15:03:33Z)
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? [8.775988650381397]
Training medical vision-language pre-training models requires datasets with paired, high-quality image-text data. Recent advancements in Large Language Models have made it possible to generate large-scale synthetic image-text pairs. We propose an automated pipeline to build a diverse, high-quality synthetic dataset.
arXiv Detail & Related papers (2024-10-17T13:11:07Z)
Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs [13.684959490938269]
We propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods. Experiments show that our method enhances the performance of a small MLLM on real-world fact-checking datasets.
arXiv Detail & Related papers (2024-09-29T11:01:14Z)
SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models [9.340077455871736]
Long-tailed distributions in image recognition pose a considerable challenge due to the severe imbalance between a few dominant classes. Recently, the use of large generative models to create synthetic data for image classification has been realized. We propose the use of synthetic data as a complement to long-tailed datasets to eliminate the impact of data imbalance.
arXiv Detail & Related papers (2024-08-29T05:33:59Z)
MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis [35.07663680944459]
Deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks. The success of deep learning is all attributed to the training on large-scale datasets. In order to solve the problem of large amount of data, some researchers put forward the method of data distillation.
arXiv Detail & Related papers (2024-08-05T14:16:54Z)
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts. We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z)
Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z)
Training Multimedia Event Extraction With Generated Images and Captions [6.291564630983316]
We propose Cross-modality Augmented Multimedia Event Learning (CAMEL) We start with two labeled unimodal datasets in text and image respectively, and generate the missing modality using off-the-shelf image generators like Stable Diffusion and image captioners like BLIP. In order to learn robust features that are effective across domains, we devise an iterative and gradual training strategy.
arXiv Detail & Related papers (2023-06-15T09:01:33Z)
Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.