Conditional Synthetic Data Generation for Personal Thermal Comfort
Models
- URL: http://arxiv.org/abs/2203.05242v1
- Date: Thu, 10 Mar 2022 08:57:25 GMT
- Title: Conditional Synthetic Data Generation for Personal Thermal Comfort
Models
- Authors: Hari Prasanna Das and Costas J. Spanos
- Abstract summary: Personal thermal comfort models aim to predict an individual's thermal comfort response, instead of the average response of a large group.
Recently, machine learning algorithms have proven to be having enormous potential as a candidate for personal thermal comfort models.
But, often within the normal settings of a building, personal thermal comfort data obtained via experiments are heavily class-imbalanced.
We propose to implement a state-of-the-art conditional synthetic data generator to generate synthetic data corresponding to the low-frequency classes.
- Score: 7.505485586268498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personal thermal comfort models aim to predict an individual's thermal
comfort response, instead of the average response of a large group. Recently,
machine learning algorithms have proven to be having enormous potential as a
candidate for personal thermal comfort models. But, often within the normal
settings of a building, personal thermal comfort data obtained via experiments
are heavily class-imbalanced. There are a disproportionately high number of
data samples for the "Prefer No Change" class, as compared with the "Prefer
Warmer" and "Prefer Cooler" classes. Machine learning algorithms trained on
such class-imbalanced data perform sub-optimally when deployed in the real
world. To develop robust machine learning-based applications using the above
class-imbalanced data, as well as for privacy-preserving data sharing, we
propose to implement a state-of-the-art conditional synthetic data generator to
generate synthetic data corresponding to the low-frequency classes. Via
experiments, we show that the synthetic data generated has a distribution that
mimics the real data distribution. The proposed method can be extended for use
by other smart building datasets/use-cases.
Related papers
- Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments [42.8983261737774]
We investigate the efficacy of data augmentation techniques leveraging SoTA AI-based methods for synthetic data generation.
Inspired by practical and experimental motivations, we explore fusion strategies of real and synthetic data to improve forecasting models.
arXiv Detail & Related papers (2024-06-07T12:36:31Z) - Self-Correcting Self-Consuming Loops for Generative Model Training [16.59453827606427]
Machine learning models are increasingly trained on a mix of human- and machine-generated data.
Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops"
Our paper aims to stabilize self-consuming generative model training by introducing an idealized correction function.
arXiv Detail & Related papers (2024-02-11T02:34:42Z) - Trading Off Scalability, Privacy, and Performance in Data Synthesis [11.698554876505446]
We introduce (a) the Howso engine, and (b) our proposed random projection based synthetic data generation framework.
We show that the synthetic data generated by Howso engine has good privacy and accuracy, which results the best overall score.
Our proposed random projection based framework can generate synthetic data with highest accuracy score, and has the fastest scalability.
arXiv Detail & Related papers (2023-12-09T02:04:25Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Federated Privacy-preserving Collaborative Filtering for On-Device Next
App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage.
We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning.
One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Investigating Bias with a Synthetic Data Generator: Empirical Evidence
and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society.
Risk is that they will systematically spread the bias embedded in data.
We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z) - A Kernelised Stein Statistic for Assessing Implicit Generative Models [10.616967871198689]
We propose a principled procedure to assess the quality of a synthetic data generator.
The sample size from the synthetic data generator can be as large as desired, while the size of the observed data, which the generator aims to emulate is fixed.
arXiv Detail & Related papers (2022-05-31T23:40:21Z) - An Analysis of the Deployment of Models Trained on Private Tabular
Synthetic Data: Unexpected Surprises [4.129847064263057]
Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models.
We study the effects of differentially private synthetic data generation on classification.
arXiv Detail & Related papers (2021-06-15T21:00:57Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.