Related papers: Energy Transformer

Energy Transformer

URL: http://arxiv.org/abs/2302.07253v2
Date: Wed, 1 Nov 2023 00:14:30 GMT
Title: Energy Transformer
Authors: Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov
Abstract summary: Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
Score: 64.22957136952725
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.

Related papers

Learning Iterative Reasoning through Energy Diffusion [90.24765095498392]
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks. IRED learns energy functions to represent the constraints between input conditions and desired outputs. We show IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks.
arXiv Detail & Related papers (2024-06-17T03:36:47Z)
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory [11.3128832831327]
Increasing the size of a Transformer does not always lead to enhanced performance. We present a theoretical framework that sheds light on the memorization during pre-training of transformer-based language models.
arXiv Detail & Related papers (2024-05-14T15:48:36Z)
A Proposed Quantum Hamiltonian Encoding Framework for Time Evolution Operator Design of Potential Energy Function [1.2277343096128712]
This research delves into time evolution operation due to potential energy functions for applications spanning quantum chemistry and condensed matter physics. The algorithms were implemented in simulators and IBM quantum hardware to prove their efficacy.
arXiv Detail & Related papers (2023-08-12T07:37:42Z)
On Feature Diversity in Energy-based Models [98.78384185493624]
An energy-based model (EBM) is typically formed of inner-model(s) that learn a combination of the different features to generate an energy mapping for each input configuration. We extend the probably approximately correct (PAC) theory of EBMs and analyze the effect of redundancy reduction on the performance of EBMs.
arXiv Detail & Related papers (2023-06-02T12:30:42Z)
Joint Feature and Differentiable $ k $-NN Graph Learning using Dirichlet Energy [103.74640329539389]
We propose a deep FS method that simultaneously conducts feature selection and differentiable $ k $-NN graph learning. We employ Optimal Transport theory to address the non-differentiability issue of learning $ k $-NN graphs in neural networks. We validate the effectiveness of our model with extensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-21T08:15:55Z)
Energy-frugal and Interpretable AI Hardware Design using Learning Automata [5.514795777097036]
A new machine learning algorithm, called the Tsetlin machine, has been proposed. In this paper, we investigate methods of energy-frugal artificial intelligence hardware design. We show that frugal resource allocation can provide decisive energy reduction while also achieving robust and interpretable learning.
arXiv Detail & Related papers (2023-05-19T15:11:18Z)
Robust and Controllable Object-Centric Learning through Energy-based Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model. We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z)
Energy Consumption of Neural Networks on NVIDIA Edge Boards: an Empirical Model [6.809944967863927]
Recently, there has been a trend of shifting the execution of deep learning inference tasks toward the edge of the network, closer to the user, to reduce latency and preserve data privacy. In this work, we aim at profiling the energetic consumption of inference tasks for some modern edge nodes. We have then distilled a simple, practical model that can provide an estimate of the energy consumption of a certain inference task on the considered boards.
arXiv Detail & Related papers (2022-10-04T14:12:59Z)
Learning Energy Networks with Generalized Fenchel-Young Losses [34.46284877812228]
Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function. We propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks.
arXiv Detail & Related papers (2022-05-19T14:32:04Z)
Attention Mechanism with Energy-Friendly Operations [61.58748425876866]
We rethink attention mechanism from the energy consumption aspects. We build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model achieves competitable accuracy.
arXiv Detail & Related papers (2022-04-28T08:50:09Z)
Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP [8.426855646402238]
The dissertation begins with a general introduction to energy-based models. We propose a method in which we train a neural network to do argmax inference under a structured energy function. We then develop ways of jointly learning energy functions and inference networks using an adversarial learning framework.
arXiv Detail & Related papers (2021-08-27T22:48:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.