Creating Synthetic Datasets for Collaborative Filtering Recommender
Systems using Generative Adversarial Networks
- URL: http://arxiv.org/abs/2303.01297v1
- Date: Thu, 2 Mar 2023 14:23:27 GMT
- Title: Creating Synthetic Datasets for Collaborative Filtering Recommender
Systems using Generative Adversarial Networks
- Authors: Jes\'us Bobadilla and Abraham Guti\'errez and Raciel Yera and Luis
Mart\'inez
- Abstract summary: Research and education in machine learning needs diverse, representative, and open datasets to handle the necessary training, validation, and testing tasks.
To feed this research variety, it is necessary and convenient to reinforce the existing datasets with synthetic ones.
This paper proposes a Generative Adversarial Network (GAN)-based method to generate collaborative filtering datasets.
- Score: 1.290382979353427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research and education in machine learning needs diverse, representative, and
open datasets that contain sufficient samples to handle the necessary training,
validation, and testing tasks. Currently, the Recommender Systems area includes
a large number of subfields in which accuracy and beyond accuracy quality
measures are continuously improved. To feed this research variety, it is
necessary and convenient to reinforce the existing datasets with synthetic
ones. This paper proposes a Generative Adversarial Network (GAN)-based method
to generate collaborative filtering datasets in a parameterized way, by
selecting their preferred number of users, items, samples, and stochastic
variability. This parameterization cannot be made using regular GANs. Our GAN
model is fed with dense, short, and continuous embedding representations of
items and users, instead of sparse, large, and discrete vectors, to make an
accurate and quick learning, compared to the traditional approach based on
large and sparse input vectors. The proposed architecture includes a DeepMF
model to extract the dense user and item embeddings, as well as a clustering
process to convert from the dense GAN generated samples to the discrete and
sparse ones, necessary to create each required synthetic dataset. The results
of three different source datasets show adequate distributions and expected
quality values and evolutions on the generated datasets compared to the source
ones. Synthetic datasets and source codes are available to researchers.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Quality-Diversity Generative Sampling for Learning with Synthetic Data [18.642540152362237]
Generative models can serve as surrogates for some real data sources by creating synthetic training datasets.
We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space.
arXiv Detail & Related papers (2023-12-22T01:43:27Z) - A Configurable Library for Generating and Manipulating Maze Datasets [0.9268994664916388]
Mazes serve as an excellent testbed due to varied generation algorithms.
We present $textttmaze-dataset$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks.
arXiv Detail & Related papers (2023-09-19T10:20:11Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Distributed Traffic Synthesis and Classification in Edge Networks: A
Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets.
FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs)
FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Differential-Critic GAN: Generating What You Want by a Cue of
Preferences [34.25181656518662]
We propose Differential-Critic Generative Adversarial Network (DiCGAN) to learn the distribution of user-desired data.
DiCGAN generates desired data that meets the user's expectations and can assist in designing biological products with desired properties.
arXiv Detail & Related papers (2021-07-14T13:44:07Z) - SYNC: A Copula based Framework for Generating Synthetic Data from
Aggregated Sources [8.350531869939351]
We study synthetic data generation task called downscaling.
We propose a multi-stage framework called SYNC (Synthetic Data Generation via Gaussian Copula)
We make four key contributions in this work.
arXiv Detail & Related papers (2020-09-20T16:36:25Z) - Lessons Learned from the Training of GANs on Artificial Datasets [0.0]
Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years.
GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained.
We train them on artificial datasets where there are infinitely many samples and the real data distributions are simple.
We find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width.
arXiv Detail & Related papers (2020-07-13T14:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.