Learning Energy-Based Approximate Inference Networks for Structured
Applications in NLP
- URL: http://arxiv.org/abs/2108.12522v1
- Date: Fri, 27 Aug 2021 22:48:20 GMT
- Title: Learning Energy-Based Approximate Inference Networks for Structured
Applications in NLP
- Authors: Lifu Tu
- Abstract summary: The dissertation begins with a general introduction to energy-based models.
We propose a method in which we train a neural network to do argmax inference under a structured energy function.
We then develop ways of jointly learning energy functions and inference networks using an adversarial learning framework.
- Score: 8.426855646402238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured prediction in natural language processing (NLP) has a long
history. The complex models of structured application come at the difficulty of
learning and inference. These difficulties lead researchers to focus more on
models with simple structure components (e.g., local classifier). Deep
representation learning has become increasingly popular in recent years. The
structure components of their method, on the other hand, are usually relatively
simple. We concentrate on complex structured models in this dissertation. We
provide a learning framework for complicated structured models as well as an
inference method with a better speed/accuracy/search error trade-off. The
dissertation begins with a general introduction to energy-based models. In NLP
and other applications, an energy function is comparable to the concept of a
scoring function. In this dissertation, we discuss the concept of the energy
function and structured models with different energy functions. Then, we
propose a method in which we train a neural network to do argmax inference
under a structured energy function, referring to the trained networks as
"inference networks" or "energy-based inference networks". We then develop ways
of jointly learning energy functions and inference networks using an
adversarial learning framework. Despite the inference and learning difficulties
of energy-based models, we present approaches in this thesis that enable
energy-based models more easily to be applied in structured NLP applications.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems [9.658615045493734]
We study several neural network architectures that are key components of sound event detection systems.
We measure the energy consumption for training and testing small to large architectures.
We establish complex relationships between the energy consumption, the number of floating-point operations, the number of parameters, and the GPU/memory utilization.
arXiv Detail & Related papers (2024-09-08T12:51:34Z) - Learning Iterative Reasoning through Energy Diffusion [90.24765095498392]
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks.
IRED learns energy functions to represent the constraints between input conditions and desired outputs.
We show IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks.
arXiv Detail & Related papers (2024-06-17T03:36:47Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge.
We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.
We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory.
We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z) - Learning with latent group sparsity via heat flow dynamics on networks [5.076419064097734]
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon.
We contribute an approach to learning under such group structure, that does not require prior information on the group identities.
We demonstrate a procedure to construct such a network based on the available data.
arXiv Detail & Related papers (2022-01-20T17:45:57Z) - Constructing Neural Network-Based Models for Simulating Dynamical
Systems [59.0861954179401]
Data-driven modeling is an alternative paradigm that seeks to learn an approximation of the dynamics of a system using observations of the true system.
This paper provides a survey of the different ways to construct models of dynamical systems using neural networks.
In addition to the basic overview, we review the related literature and outline the most significant challenges from numerical simulations that this modeling paradigm must overcome.
arXiv Detail & Related papers (2021-11-02T10:51:42Z) - On Energy-Based Models with Overparametrized Shallow Neural Networks [44.74000986284978]
Energy-based models (EBMs) are a powerful framework for generative modeling.
In this work we focus on shallow neural networks.
We show that models trained in the so-called "active" regime provide a statistical advantage over their associated "lazy" or kernel regime.
arXiv Detail & Related papers (2021-04-15T15:34:58Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.