Related papers: Non-Generative Energy Based Models

Non-Generative Energy Based Models

URL: http://arxiv.org/abs/2304.01297v1
Date: Mon, 3 Apr 2023 18:47:37 GMT
Title: Non-Generative Energy Based Models
Authors: Jacob Piland and Christopher Sweet and Priscila Saboia and Charles Vardeman II and Adam Czajka
Abstract summary: Energy-based models (EBM) have become increasingly popular within computer vision. We propose a non-generative training approach, Non-Generative EBM (NG-EBM) We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance.
Score: 3.1447898427012473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Energy-based models (EBM) have become increasingly popular within computer vision. EBMs bring a probabilistic approach to training deep neural networks (DNN) and have been shown to enhance performance in areas such as calibration, out-of-distribution detection, and adversarial resistance. However, these advantages come at the cost of estimating input data probabilities, usually using a Langevin based method such as Stochastic Gradient Langevin Dynamics (SGLD), which bring additional computational costs, require parameterization, caching methods for efficiency, and can run into stability and scaling issues. EBMs use dynamical methods to draw samples from the probability density function (PDF) defined by the current state of the network and compare them to the training data using a maximum log likelihood approach to learn the correct PDF. We propose a non-generative training approach, Non-Generative EBM (NG-EBM), that utilizes the {\it{Approximate Mass}}, identified by Grathwohl et al., as a loss term to direct the training. We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance, but without the computational complexity and overhead of the traditional approaches. In particular, the NG-EBM approach improves the Expected Calibration Error by a factor of 2.5 for CIFAR10 and 7.5 times for CIFAR100, when compared to traditionally trained models.

Related papers

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge. Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling. Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z)
Gradient-free variational learning with conditional mixture networks [39.827869318925494]
Conditional mixture networks (CMNs) are suitable for fast, gradient-free inference and can solve complex classification tasks. We validate this approach by training two-layer CMNs on standard benchmarks from the UCI repository. Our method, CAVI-CMN, achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation.
arXiv Detail & Related papers (2024-08-29T10:43:55Z)
Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming. There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z)
Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z)
Balanced Training of Energy-Based Models with Adaptive Flow Sampling [13.951904929884618]
Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. We propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF) Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times.
arXiv Detail & Related papers (2023-06-01T13:58:06Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models [46.87187776084161]
We propose ratio matching with gradient-guided importance sampling (RMwGGIS) to learn energy-based models (EBMs) on high-dimensional data. We perform experiments on density modeling over synthetic discrete data, graph generation, and training Ising models to evaluate our proposed method. Our method can significantly alleviate the limitations of ratio matching, perform more effectively in practice, and scale to high-dimensional problems.
arXiv Detail & Related papers (2022-10-11T20:52:48Z)
A VAE-Based Bayesian Bidirectional LSTM for Renewable Energy Forecasting [0.4588028371034407]
intermittent nature of renewable energy poses new challenges to the network operational planning with underlying uncertainties. This paper proposes a novel Bayesian probabilistic technique for forecasting renewable power generation by addressing data and model uncertainties. It is inferred from the numerical results that VAE-Bayesian BiLSTM outperforms other probabilistic deep learning methods in terms of forecasting accuracy and computational efficiency for different sizes of the dataset.
arXiv Detail & Related papers (2021-03-24T03:47:20Z)
Energy Forecasting in Smart Grid Systems: A Review of the State-of-the-art Techniques [2.3436632098950456]
This paper presents a review of state-of-the-art forecasting methods for smart grid (SG) systems. Traditional point forecasting methods including statistical, machine learning (ML), and deep learning (DL) are extensively investigated. A comparative case study using the Victorian electricity consumption and American electric power (AEP) is conducted.
arXiv Detail & Related papers (2020-11-25T09:17:07Z)
No MCMC for me: Amortized sampling for fast and stable training of energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty. We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling. Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z)
Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging. We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence. Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.