Learning Energy Networks with Generalized Fenchel-Young Losses
- URL: http://arxiv.org/abs/2205.09589v1
- Date: Thu, 19 May 2022 14:32:04 GMT
- Title: Learning Energy Networks with Generalized Fenchel-Young Losses
- Authors: Mathieu Blondel, Felipe Llinares-L\'opez, Robert Dadashi, L\'eonard
Hussenot, Matthieu Geist
- Abstract summary: Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function.
We propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks.
- Score: 34.46284877812228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-based models, a.k.a. energy networks, perform inference by optimizing
an energy function, typically parametrized by a neural network. This allows one
to capture potentially complex relationships between inputs and outputs. To
learn the parameters of the energy function, the solution to that optimization
problem is typically fed into a loss function. The key challenge for training
energy networks lies in computing loss gradients, as this typically requires
argmin/argmax differentiation. In this paper, building upon a generalized
notion of conjugate function, which replaces the usual bilinear pairing with a
general energy function, we propose generalized Fenchel-Young losses, a natural
loss construction for learning energy networks. Our losses enjoy many desirable
properties and their gradients can be computed efficiently without
argmin/argmax differentiation. We also prove the calibration of their excess
risk in the case of linear-concave energies. We demonstrate our losses on
multilabel classification and imitation learning tasks.
Related papers
- Learning Iterative Reasoning through Energy Diffusion [90.24765095498392]
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks.
IRED learns energy functions to represent the constraints between input conditions and desired outputs.
We show IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks.
arXiv Detail & Related papers (2024-06-17T03:36:47Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Energy-Dissipative Evolutionary Deep Operator Neural Networks [12.764072441220172]
Energy-Dissipative Evolutionary Deep Operator Neural Network is an operator learning neural network.
It is designed to seed numerical solutions for a class of partial differential equations.
arXiv Detail & Related papers (2023-06-09T22:11:16Z) - On Feature Diversity in Energy-based Models [98.78384185493624]
An energy-based model (EBM) is typically formed of inner-model(s) that learn a combination of the different features to generate an energy mapping for each input configuration.
We extend the probably approximately correct (PAC) theory of EBMs and analyze the effect of redundancy reduction on the performance of EBMs.
arXiv Detail & Related papers (2023-06-02T12:30:42Z) - Uncovering Energy-Efficient Practices in Deep Learning Training:
Preliminary Steps Towards Green AI [8.025202812165412]
We consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage.
We examine the training stage of the deep learning pipeline from a sustainability perspective.
We highlight innovative and promising energy-efficient practices for training deep learning models.
arXiv Detail & Related papers (2023-03-24T12:48:21Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory.
We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z) - An End-to-End learnable Flow Regularized Model for Brain Tumor
Segmentation [1.253312107729806]
We propose to incorporate end-to-end trainable neural network features into the energy functions.
Our deep neural network features are extracted from the down-sampling and up-sampling layers with skip-connections of a U-net.
And the segmentations are solved in a primal-dual form by ADMM solvers.
arXiv Detail & Related papers (2021-09-01T21:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.