Related papers: Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP

Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP

URL: http://arxiv.org/abs/2108.12522v1
Date: Fri, 27 Aug 2021 22:48:20 GMT
Title: Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP
Authors: Lifu Tu
Abstract summary: The dissertation begins with a general introduction to energy-based models. We propose a method in which we train a neural network to do argmax inference under a structured energy function. We then develop ways of jointly learning energy functions and inference networks using an adversarial learning framework.
Score: 8.426855646402238
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured prediction in natural language processing (NLP) has a long history. The complex models of structured application come at the difficulty of learning and inference. These difficulties lead researchers to focus more on models with simple structure components (e.g., local classifier). Deep representation learning has become increasingly popular in recent years. The structure components of their method, on the other hand, are usually relatively simple. We concentrate on complex structured models in this dissertation. We provide a learning framework for complicated structured models as well as an inference method with a better speed/accuracy/search error trade-off. The dissertation begins with a general introduction to energy-based models. In NLP and other applications, an energy function is comparable to the concept of a scoring function. In this dissertation, we discuss the concept of the energy function and structured models with different energy functions. Then, we propose a method in which we train a neural network to do argmax inference under a structured energy function, referring to the trained networks as "inference networks" or "energy-based inference networks". We then develop ways of jointly learning energy functions and inference networks using an adversarial learning framework. Despite the inference and learning difficulties of energy-based models, we present approaches in this thesis that enable energy-based models more easily to be applied in structured NLP applications.

Related papers

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems [9.658615045493734]
We study several neural network architectures that are key components of sound event detection systems. We measure the energy consumption for training and testing small to large architectures. We establish complex relationships between the energy consumption, the number of floating-point operations, the number of parameters, and the GPU/memory utilization.
arXiv Detail & Related papers (2024-09-08T12:51:34Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
Learning Iterative Reasoning through Energy Diffusion [90.24765095498392]
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks. IRED learns energy functions to represent the constraints between input conditions and desired outputs. We show IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks.
arXiv Detail & Related papers (2024-06-17T03:36:47Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z)
Learning with latent group sparsity via heat flow dynamics on networks [5.076419064097734]
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon. We contribute an approach to learning under such group structure, that does not require prior information on the group identities. We demonstrate a procedure to construct such a network based on the available data.
arXiv Detail & Related papers (2022-01-20T17:45:57Z)
Constructing Neural Network-Based Models for Simulating Dynamical Systems [59.0861954179401]
Data-driven modeling is an alternative paradigm that seeks to learn an approximation of the dynamics of a system using observations of the true system. This paper provides a survey of the different ways to construct models of dynamical systems using neural networks. In addition to the basic overview, we review the related literature and outline the most significant challenges from numerical simulations that this modeling paradigm must overcome.
arXiv Detail & Related papers (2021-11-02T10:51:42Z)
On Energy-Based Models with Overparametrized Shallow Neural Networks [44.74000986284978]
Energy-based models (EBMs) are a powerful framework for generative modeling. In this work we focus on shallow neural networks. We show that models trained in the so-called "active" regime provide a statistical advantage over their associated "lazy" or kernel regime.
arXiv Detail & Related papers (2021-04-15T15:34:58Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.