Related papers: Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets

Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets

URL: http://arxiv.org/abs/2410.17651v2
Date: Thu, 24 Oct 2024 20:57:04 GMT
Title: Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets
Authors: Jesús Bobadilla, Abraham Gutiérrez,
Abstract summary: The published method Generative Adversarial Networks for Recommender Systems (GANRS) allows generating data sets for collaborative filtering recommendation systems. We have tested the GANRS method by creating multiple synthetic datasets from three different real datasets taken as a source. We have also selected six state-of-the-art collaborative filtering deep learning models to test both their comparative performance and the GANRS method.
Score: 0.27624021966289597
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The published method Generative Adversarial Networks for Recommender Systems (GANRS) allows generating data sets for collaborative filtering recommendation systems. The GANRS source code is available along with a representative set of generated datasets. We have tested the GANRS method by creating multiple synthetic datasets from three different real datasets taken as a source. Experiments include variations in the number of users in the synthetic datasets, as well as a different number of samples. We have also selected six state-of-the-art collaborative filtering deep learning models to test both their comparative performance and the GANRS method. The results show a consistent behavior of the generated datasets compared to the source ones; particularly, in the obtained values and trends of the precision and recall quality measures. The tested deep learning models have also performed as expected on all synthetic datasets, making it possible to compare the results with those obtained from the real source data. Future work is proposed, including different cold start scenarios, unbalanced data, and demographic fairness.

Related papers

Procedural Environment Generation for Tool-Use Agents [55.417058694785325]
We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data.<n>We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks.
arXiv Detail & Related papers (2025-05-21T14:10:06Z)
What Makes Good Synthetic Training Data for Zero-Shot Stereo Matching? [57.49867420132091]
We report the effects on zero-shot stereo matching performance using standard benchmarks.<n>We validate our findings by collecting the best settings and creating a large-scale dataset.<n>We open-source our system to enable further research on procedural stereo datasets.
arXiv Detail & Related papers (2025-04-23T17:59:33Z)
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation [71.46236155101032]
Current data generation methods rely on seed sets containing tens of thousands of examples to prompt instruction-tuned models.<n>We show that when working with only a few seed examples, instruction-tuned models produce insufficient diversity for downstream tasks.<n>We propose Base-Refine, a novel two-stage method that combines the diversity of base models with the quality assurance of instruction-tuned models.
arXiv Detail & Related papers (2025-02-03T00:12:40Z)
Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable. We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z)
FuseGen: PLM Fusion for Data-generation based Zero-shot Learning [18.51772808242954]
FuseGen is a novel data generation-based zero-shot learning framework. It introduces a new criteria for subset selection from synthetic datasets. The chosen subset provides in-context feedback to each PLM, enhancing dataset quality.
arXiv Detail & Related papers (2024-06-18T11:55:05Z)
Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [66.87070857705994]
In low-resource settings, the amount of seed data samples to use for data augmentation is very small. We propose a novel method that augments training data by incorporating a wealth of examples from other datasets. This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone.
arXiv Detail & Related papers (2024-02-21T02:45:46Z)
Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task. We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Creating Synthetic Datasets for Collaborative Filtering Recommender Systems using Generative Adversarial Networks [1.290382979353427]
Research and education in machine learning needs diverse, representative, and open datasets to handle the necessary training, validation, and testing tasks. To feed this research variety, it is necessary and convenient to reinforce the existing datasets with synthetic ones. This paper proposes a Generative Adversarial Network (GAN)-based method to generate collaborative filtering datasets.
arXiv Detail & Related papers (2023-03-02T14:23:27Z)
Distributed Traffic Synthesis and Classification in Edge Networks: A Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets. FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs) FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z)
FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture. We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Lessons Learned from the Training of GANs on Artificial Datasets [0.0]
Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years. GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained. We train them on artificial datasets where there are infinitely many samples and the real data distributions are simple. We find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width.
arXiv Detail & Related papers (2020-07-13T14:51:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.