JEM++: Improved Techniques for Training JEM
- URL: http://arxiv.org/abs/2109.09032v2
- Date: Tue, 21 Sep 2021 02:17:43 GMT
- Title: JEM++: Improved Techniques for Training JEM
- Authors: Xiulong Yang, Shihao Ji
- Abstract summary: Joint Energy-based Model (JEM) is a recently proposed hybrid model that retains strong discriminative power of modern CNN classifiers.
We propose a variety of new training procedures and architecture features to improve JEM's accuracy, training stability, and speed altogether.
- Score: 1.5533842336139065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Joint Energy-based Model (JEM) is a recently proposed hybrid model that
retains strong discriminative power of modern CNN classifiers, while generating
samples rivaling the quality of GAN-based approaches. In this paper, we propose
a variety of new training procedures and architecture features to improve JEM's
accuracy, training stability, and speed altogether. 1) We propose a proximal
SGLD to generate samples in the proximity of samples from the previous step,
which improves the stability. 2) We further treat the approximate maximum
likelihood learning of EBM as a multi-step differential game, and extend the
YOPO framework to cut out redundant calculations during backpropagation, which
accelerates the training substantially. 3) Rather than initializing SGLD chain
from random noise, we introduce a new informative initialization that samples
from a distribution estimated from training data. 4) This informative
initialization allows us to enable batch normalization in JEM, which further
releases the power of modern CNN architectures for hybrid modeling. Code:
https://github.com/sndnyang/JEMPP
Related papers
- MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training.
We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z) - Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning.
Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training.
To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z) - A Bayesian Flow Network Framework for Chemistry Tasks [0.0]
We introduce ChemBFN, a language model that handles chemistry tasks based on Bayesian flow networks.
A new accuracy schedule is proposed to improve the sampling quality.
We show evidence that our method is appropriate for generating molecules with satisfied diversity even when a smaller number of sampling steps is used.
arXiv Detail & Related papers (2024-07-28T04:46:32Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - CoopInit: Initializing Generative Adversarial Networks via Cooperative
Learning [50.90384817689249]
CoopInit is a cooperative learning-based strategy that can quickly learn a good starting point for GANs.
We demonstrate the effectiveness of the proposed approach on image generation and one-sided unpaired image-to-image translation tasks.
arXiv Detail & Related papers (2023-03-21T07:49:32Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Tuning Language Models as Training Data Generators for
Augmentation-Enhanced Few-Shot Learning [30.65315081964461]
We study few-shot learning with pretrained language models (PLMs) from a different perspective.
We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples.
Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods.
arXiv Detail & Related papers (2022-11-06T06:46:47Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.