Related papers: A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

URL: http://arxiv.org/abs/2307.14397v1
Date: Wed, 26 Jul 2023 12:05:08 GMT
Title: A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot
Authors: Milad Abdollahzadeh, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Guimeng Liu, Ngai-Man Cheung
Abstract summary: In machine learning, generative modeling aims to generate new data statistically similar to the training data distribution. This is an important topic when data acquisition is challenging, e.g. healthcare applications. We study interactions between different GM-DC tasks and approaches.
Score: 33.564516823250806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In machine learning, generative modeling aims to learn to generate new data statistically similar to the training data distribution. In this paper, we survey learning generative models under limited data, few shots and zero shot, referred to as Generative Modeling under Data Constraint (GM-DC). This is an important topic when data acquisition is challenging, e.g. healthcare applications. We discuss background, challenges, and propose two taxonomies: one on GM-DC tasks and another on GM-DC approaches. Importantly, we study interactions between different GM-DC tasks and approaches. Furthermore, we highlight research gaps, research trends, and potential avenues for future exploration. Project website: https://gmdc-survey.github.io.

Related papers

Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture [69.58440626023541]
Diffusion models (DMs) have demonstrated exceptional generative capabilities across various domains. DMs now consume increasingly large amounts of data. We propose a novel scenario: using existing DMs as data sources to train new DMs with any architecture.
arXiv Detail & Related papers (2024-09-05T14:12:22Z)
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data [48.31817189858086]
We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. We find that DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.
arXiv Detail & Related papers (2024-05-16T15:30:18Z)
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control [59.20038082523832]
We present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We develop a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data.
arXiv Detail & Related papers (2024-03-28T14:07:13Z)
Comprehensive Exploration of Synthetic Data Generation: A Survey [4.485401662312072]
This work surveys 417 Synthetic Data Generation models over the last decade. The findings reveal increased model performance and complexity, with neural network-based approaches prevailing. Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete.
arXiv Detail & Related papers (2024-01-04T20:23:51Z)
From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling [17.074858228123706]
We focus on fundamental theory, methodology, drawbacks, datasets, and metrics. We cover applications of causal generative models in fairness, privacy, out-of-distribution generalization, precision medicine, and biological sciences.
arXiv Detail & Related papers (2023-10-17T05:45:32Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal [57.8455911689554]
Knowledge graph reasoning (KGR) aims to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs) It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering, recommendation systems, and etc.
arXiv Detail & Related papers (2022-12-12T08:40:04Z)
RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training [19.987304448524043]
We propose a multi-task self-supervised learning-based researcher data pre-training model named RPT. We divide the researchers' data into semantic document sets and community graph. We propose three self-supervised learning objectives to train the whole model.
arXiv Detail & Related papers (2021-10-08T03:42:09Z)
On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions. Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
Have you forgotten? A method to assess if machine learning models have forgotten data [20.9131206112401]
In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity. In this paper, we want to address the challenging question of whether data have been forgotten by a model. We establish statistical methods that compare the target's outputs with outputs of models trained with different datasets.
arXiv Detail & Related papers (2020-04-21T16:13:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.