Kronecker Attention Networks
- URL: http://arxiv.org/abs/2007.08442v1
- Date: Thu, 16 Jul 2020 16:26:02 GMT
- Title: Kronecker Attention Networks
- Authors: Hongyang Gao, Zhengyang Wang, Shuiwang Ji
- Abstract summary: We develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly.
Results show that our methods reduce the amount of required computational resources by a factor of hundreds.
- Score: 69.22257624495899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention operators have been applied on both 1-D data like texts and
higher-order data such as images and videos. Use of attention operators on
high-order data requires flattening of the spatial or spatial-temporal
dimensions into a vector, which is assumed to follow a multivariate normal
distribution. This not only incurs excessive requirements on computational
resources, but also fails to preserve structures in data. In this work, we
propose to avoid flattening by assuming the data follow matrix-variate normal
distributions. Based on this new view, we develop Kronecker attention operators
(KAOs) that operate on high-order tensor data directly. More importantly, the
proposed KAOs lead to dramatic reductions in computational resources.
Experimental results show that our methods reduce the amount of required
computational resources by a factor of hundreds, with larger factors for
higher-dimensional and higher-order data. Results also show that networks with
KAOs outperform models without attention, while achieving competitive
performance as those with original attention operators.
Related papers
- Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Loki: Low-Rank Keys for Efficient Sparse Attention [44.74682508879725]
We propose a novel sparse attention method that ranks and selects tokens in the KV-cache based on attention scores computed in low-dimensional space.
Our evaluations show that Loki is able to maintain the efficacy of the models better than other popular approximation methods.
arXiv Detail & Related papers (2024-06-04T17:58:03Z) - Gradient-Based Spectral Embeddings of Random Dot Product Graphs [7.612218105739107]
In this paper, we show how to better solve the embedding problem of the Random Dot Product Graph (RDPG)
We develop a novel feasible optimization method in the resulting manifold.
Our open-source algorithm implementations are scalable, and unlike the they are robust to missing edge data and can track slowly, latent positions from streaming graphs.
arXiv Detail & Related papers (2023-07-25T21:09:55Z) - Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain.
This allows us to define an entirely structural model that does not require computing the embedding of the input graph.
Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z) - Unsupervised Finetuning [80.58625921631506]
We propose two strategies to combine source and target data into unsupervised finetuning.
The motivation of the former strategy is to add a small portion of source data back to occupy their pretrained representation space.
The motivation of the latter strategy is to increase the data density and help learn more compact representation.
arXiv Detail & Related papers (2021-10-18T17:57:05Z) - Augmented Tensor Decomposition with Stochastic Optimization [46.16865811396394]
Real-world tensor data are usually high-ordered and have large dimensions with millions or billions of entries.
It is expensive to decompose the whole tensor with traditional algorithms.
This paper proposes augmented tensor decomposition, which effectively incorporates data augmentations to boost downstream classification.
arXiv Detail & Related papers (2021-06-15T06:29:05Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Channelized Axial Attention for Semantic Segmentation [70.14921019774793]
We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity.
Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
arXiv Detail & Related papers (2021-01-19T03:08:03Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.