Related papers: Data Generation via Latent Factor Simulation for Fairness-aware Re-ranking

Data Generation via Latent Factor Simulation for Fairness-aware Re-ranking

URL: http://arxiv.org/abs/2409.14078v1
Date: Sat, 21 Sep 2024 09:13:50 GMT
Title: Data Generation via Latent Factor Simulation for Fairness-aware Re-ranking
Authors: Elena Stefancova, Cassidy All, Joshua Paup, Martin Homola, Nicholas Mattei, Robin Burke,
Abstract summary: Synthetic data is a useful resource for algorithmic research. We propose a novel type of data for fairness-aware recommendation: synthetic recommender system outputs.
Score: 11.133319460036082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Synthetic data is a useful resource for algorithmic research. It allows for the evaluation of systems under a range of conditions that might be difficult to achieve in real world settings. In recommender systems, the use of synthetic data is somewhat limited; some work has concentrated on building user-item interaction data at large scale. We believe that fairness-aware recommendation research can benefit from simulated data as it allows the study of protected groups and their interactions without depending on sensitive data that needs privacy protection. In this paper, we propose a novel type of data for fairness-aware recommendation: synthetic recommender system outputs that can be used to study re-ranking algorithms.

Related papers

What Makes Good Synthetic Training Data for Zero-Shot Stereo Matching? [57.49867420132091]
We report the effects on zero-shot stereo matching performance using standard benchmarks.<n>We validate our findings by collecting the best settings and creating a large-scale dataset.<n>We open-source our system to enable further research on procedural stereo datasets.
arXiv Detail & Related papers (2025-04-23T17:59:33Z)
Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms [2.144088660722956]
We find that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. Applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data.
arXiv Detail & Related papers (2025-01-03T12:35:58Z)
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems [0.0]
Synthetic datasets are important for evaluating and testing machine learning models. We develop a novel framework for generating synthetic datasets that are diverse and statistically coherent. The framework is available as a free open Python package to facilitate research with minimal friction.
arXiv Detail & Related papers (2024-11-27T09:53:14Z)
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. RAG systems may face severe privacy risks when retrieving private data. We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z)
Permissioned Blockchain-based Framework for Ranking Synthetic Data Generators [0.5541644538483947]
We introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties.
arXiv Detail & Related papers (2024-05-12T07:46:00Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Synthetic Data: Methods, Use Cases, and Risks [11.413309528464632]
A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. We provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
arXiv Detail & Related papers (2023-03-01T16:35:33Z)
Synthcity: facilitating innovative use cases of synthetic data in different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z)
Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society. Risk is that they will systematically spread the bias embedded in data. We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z)
Enabling Synthetic Data adoption in regulated domains [1.9512796489908306]
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties.
arXiv Detail & Related papers (2022-04-13T10:53:54Z)
Synthetic Data and Simulators for Recommendation Systems: Current State and Future Directions [0.0]
Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. We identify and discuss a key trade-off between data fidelity and privacy in the past work on synthetic data and simulators for recommendation systems.
arXiv Detail & Related papers (2021-12-21T07:29:09Z)
Synthetic Benchmarks for Scientific Research in Explainable Machine Learning [14.172740234933215]
We release XAI-Bench: a suite of synthetic datasets and a library for benchmarking feature attribution algorithms. Unlike real-world datasets, synthetic datasets allow the efficient computation of conditional expected values. We demonstrate the power of our library by benchmarking popular explainability techniques across several evaluation metrics and identifying failure modes for popular explainers.
arXiv Detail & Related papers (2021-06-23T17:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.