Synthetic Data in Human Analysis: A Survey
- URL: http://arxiv.org/abs/2208.09191v1
- Date: Fri, 19 Aug 2022 07:32:34 GMT
- Title: Synthetic Data in Human Analysis: A Survey
- Authors: Indu Joshi, Marcel Grimmer, Christian Rathgeb, Christoph Busch,
Francois Bremond, Antitza Dantcheva
- Abstract summary: Survey is intended for researchers and practitioners in the field of human analysis.
We conduct a survey that summarises current state-of-the-art methods and the main benefits of using synthetic data.
We also provide an overview of publicly available synthetic datasets and generation models.
- Score: 16.562921709882865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have become prevalent in human analysis, boosting the
performance of applications, such as biometric recognition, action recognition,
as well as person re-identification. However, the performance of such networks
scales with the available training data. In human analysis, the demand for
large-scale datasets poses a severe challenge, as data collection is tedious,
time-expensive, costly and must comply with data protection laws. Current
research investigates the generation of \textit{synthetic data} as an efficient
and privacy-ensuring alternative to collecting real data in the field. This
survey introduces the basic definitions and methodologies, essential when
generating and employing synthetic data for human analysis. We conduct a survey
that summarises current state-of-the-art methods and the main benefits of using
synthetic data. We also provide an overview of publicly available synthetic
datasets and generation models. Finally, we discuss limitations, as well as
open research problems in this field. This survey is intended for researchers
and practitioners in the field of human analysis.
Related papers
- Exploring the Impact of Synthetic Data for Aerial-view Human Detection [17.41001388151408]
Aerial-view human detection has a large demand for large-scale data to capture more diverse human appearances.
Synthetic data can be a good resource to expand data, but the domain gap with real-world data is the biggest obstacle to its use in training.
arXiv Detail & Related papers (2024-05-24T04:19:48Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data [40.165159490379146]
We show that the rate of false-positive findings (type 1 error) will be unacceptably high, even when the estimates are unbiased.
Despite the use of a previously proposed correction factor, this problem persists for deep generative models.
arXiv Detail & Related papers (2023-12-13T02:04:41Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - Synthetic data generation for a longitudinal cohort study -- Evaluation,
method extension and reproduction of published data analysis results [0.32593385688760446]
In the health sector, access to individual-level data is often challenging due to privacy concerns.
A promising alternative is the generation of fully synthetic data.
In this study, we use a state-of-the-art synthetic data generation method.
arXiv Detail & Related papers (2023-05-12T13:13:55Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z) - Measuring Utility and Privacy of Synthetic Genomic Data [3.635321290763711]
We provide the first evaluation of the utility and the privacy protection of five state-of-the-art models for generating synthetic genomic data.
Overall, there is no single approach for generating synthetic genomic data that performs well across the board.
arXiv Detail & Related papers (2021-02-05T17:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.