Related papers: ButterflyFlow: Building Invertible Layers with Butterfly Matrices

ButterflyFlow: Building Invertible Layers with Butterfly Matrices

URL: http://arxiv.org/abs/2209.13774v1
Date: Wed, 28 Sep 2022 01:58:18 GMT
Title: ButterflyFlow: Building Invertible Layers with Butterfly Matrices
Authors: Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, and Stefano Ermon
Abstract summary: We propose a new family of invertible linear layers based on butterfly layers. Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow.
Score: 80.83142511616262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Normalizing flows model complex probability distributions using maps obtained by composing invertible layers. Special linear layers such as masked and 1x1 convolutions play a key role in existing architectures because they increase expressive power while having tractable Jacobians and inverses. We propose a new family of invertible linear layers based on butterfly layers, which are known to theoretically capture complex linear structures including permutations and periodicity, yet can be inverted efficiently. This representational power is a key advantage of our approach, as such structures are common in many real-world datasets. Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow. Empirically, we demonstrate that ButterflyFlows not only achieve strong density estimation results on natural images such as MNIST, CIFAR-10, and ImageNet 32x32, but also obtain significantly better log-likelihoods on structured datasets such as galaxy images and MIMIC-III patient cohorts -- all while being more efficient in terms of memory and computation than relevant baselines.

Related papers

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices [88.33936714942996]
We present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. We show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables. We find that Mixture-of-Experts (MoE) learns an MoE in every single linear layer of the model, including the projection in the attention blocks.
arXiv Detail & Related papers (2024-10-03T00:44:50Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation [11.072628804821083]
CNNs, Transformers, and Fourier-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be interpreted through the lens of a general concept of dimension mixing. In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures.
arXiv Detail & Related papers (2023-11-30T17:30:45Z)
Lite it fly: An All-Deformable-Butterfly Network [7.8460795568982435]
Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions.
arXiv Detail & Related papers (2023-11-14T12:41:22Z)
Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trainedvariant. We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z)
Deformable Butterfly: A Highly Structured and Sparse Linear Transform [5.695853802236908]
We introduce a new kind of linear transform named Deformable Butterfly (DeBut) that generalizes the conventional butterfly matrices. It inherits the fine-to-coarse-grained learnable hierarchy of traditional butterflies and when deployed to neural networks, the prominent structures and sparsity in a DeBut layer constitutes a new way for network compression.
arXiv Detail & Related papers (2022-03-25T10:20:50Z)
SPINE: Soft Piecewise Interpretable Neural Equations [0.0]
Fully connected networks are ubiquitous but uninterpretable. This paper takes a novel approach to piecewise fits by using set operations on individual pieces(parts) It can find a variety of applications where fully connected layers must be replaced by interpretable layers.
arXiv Detail & Related papers (2021-11-20T16:18:00Z)
Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice [4.3400407844814985]
We propose to replace a dense linear layer in any neural network by an architecture based on the butterfly network. In a collection of experiments, including supervised prediction on both the NLP and vision data, we show that this not only produces results that match and at times outperform existing well-known architectures.
arXiv Detail & Related papers (2020-07-17T09:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.