Woodbury Transformations for Deep Generative Flows
- URL: http://arxiv.org/abs/2002.12229v3
- Date: Fri, 8 Jan 2021 15:22:26 GMT
- Title: Woodbury Transformations for Deep Generative Flows
- Authors: You Lu, Bert Huang
- Abstract summary: We introduce Woodbury transformations, which achieve efficient invertibility via the Woodbury matrix identity.
Woodbury transformations enable (1) high-dimensional interactions, (2) efficient sampling, and (3) efficient likelihood evaluation.
- Score: 17.062207075794205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Normalizing flows are deep generative models that allow efficient likelihood
calculation and sampling. The core requirement for this advantage is that they
are constructed using functions that can be efficiently inverted and for which
the determinant of the function's Jacobian can be efficiently computed.
Researchers have introduced various such flow operations, but few of these
allow rich interactions among variables without incurring significant
computational costs. In this paper, we introduce Woodbury transformations,
which achieve efficient invertibility via the Woodbury matrix identity and
efficient determinant calculation via Sylvester's determinant identity. In
contrast with other operations used in state-of-the-art normalizing flows,
Woodbury transformations enable (1) high-dimensional interactions, (2)
efficient sampling, and (3) efficient likelihood evaluation. Other similar
operations, such as 1x1 convolutions, emerging convolutions, or periodic
convolutions allow at most two of these three advantages. In our experiments on
multiple image datasets, we find that Woodbury transformations allow learning
of higher-likelihood models than other flow architectures while still enjoying
their efficiency advantages.
Related papers
- MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers [43.39466934693055]
We present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective.
This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers.
We conduct extensive experiments on various benchmarks to demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2024-11-20T02:41:53Z) - SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity [11.354974227479355]
We propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor.
Our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.
arXiv Detail & Related papers (2024-10-18T17:30:17Z) - Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - Safe Use of Neural Networks [0.0]
We use number based codes that can detect arithmetic errors in the network's processing steps.
One set of parities is obtained from a section's outputs while a second comparable set is developed directly from the original inputs.
We focus on using long numerically based convolutional codes because of the large size of data sets.
arXiv Detail & Related papers (2023-06-13T19:07:14Z) - ButterflyFlow: Building Invertible Layers with Butterfly Matrices [80.83142511616262]
We propose a new family of invertible linear layers based on butterfly layers.
Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow.
arXiv Detail & Related papers (2022-09-28T01:58:18Z) - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement [80.94378602238432]
We propose an efficient structure named Correspondence Efficient Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner.
To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates.
Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
arXiv Detail & Related papers (2022-09-25T13:05:33Z) - Building an Efficiency Pipeline: Commutativity and Cumulativeness of
Efficiency Operators for Transformers [68.55472265775514]
We consider an efficiency method as an operator applied on a model.
In this paper, we study the plausibility of this idea, and the commutativity and cumulativeness of efficiency operators.
arXiv Detail & Related papers (2022-07-31T18:01:06Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.