Differential-Critic GAN: Generating What You Want by a Cue of
Preferences
- URL: http://arxiv.org/abs/2107.06700v3
- Date: Mon, 17 Apr 2023 02:23:11 GMT
- Title: Differential-Critic GAN: Generating What You Want by a Cue of
Preferences
- Authors: Yinghua Yao, Yuangang Pan, Ivor W.Tsang, Xin Yao
- Abstract summary: We propose Differential-Critic Generative Adversarial Network (DiCGAN) to learn the distribution of user-desired data.
DiCGAN generates desired data that meets the user's expectations and can assist in designing biological products with desired properties.
- Score: 34.25181656518662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes Differential-Critic Generative Adversarial Network
(DiCGAN) to learn the distribution of user-desired data when only partial
instead of the entire dataset possesses the desired property. DiCGAN generates
desired data that meets the user's expectations and can assist in designing
biological products with desired properties. Existing approaches select the
desired samples first and train regular GANs on the selected samples to derive
the user-desired data distribution. However, the selection of the desired data
relies on global knowledge and supervision over the entire dataset. DiCGAN
introduces a differential critic that learns from pairwise preferences, which
are local knowledge and can be defined on a part of training data. The critic
is built by defining an additional ranking loss over the Wasserstein GAN's
critic. It endows the difference of critic values between each pair of samples
with the user preference and guides the generation of the desired data instead
of the whole data. For a more efficient solution to ensure data quality, we
further reformulate DiCGAN as a constrained optimization problem, based on
which we theoretically prove the convergence of our DiCGAN. Extensive
experiments on a diverse set of datasets with various applications demonstrate
that our DiCGAN achieves state-of-the-art performance in learning the
user-desired data distributions, especially in the cases of insufficient
desired data and limited supervision.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Utilizing dataset affinity prediction in object detection to assess training data [4.508868068781057]
We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets.
The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
arXiv Detail & Related papers (2023-11-16T10:45:32Z) - Creating Synthetic Datasets for Collaborative Filtering Recommender
Systems using Generative Adversarial Networks [1.290382979353427]
Research and education in machine learning needs diverse, representative, and open datasets to handle the necessary training, validation, and testing tasks.
To feed this research variety, it is necessary and convenient to reinforce the existing datasets with synthetic ones.
This paper proposes a Generative Adversarial Network (GAN)-based method to generate collaborative filtering datasets.
arXiv Detail & Related papers (2023-03-02T14:23:27Z) - Distributed Traffic Synthesis and Classification in Edge Networks: A
Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets.
FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs)
FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z) - Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision.
Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - Generating Data to Mitigate Spurious Correlations in Natural Language
Inference Datasets [27.562256973255728]
Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on.
We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model.
Our approach consists of 1) a method for training data generators to generate high-quality, label-consistent data samples; and 2) a filtering mechanism for removing data points that contribute to spurious correlations.
arXiv Detail & Related papers (2022-03-24T09:08:05Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.