Related papers: Kronecker Attention Networks

Kronecker Attention Networks

URL: http://arxiv.org/abs/2007.08442v1
Date: Thu, 16 Jul 2020 16:26:02 GMT
Title: Kronecker Attention Networks
Authors: Hongyang Gao, Zhengyang Wang, Shuiwang Ji
Abstract summary: We develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. Results show that our methods reduce the amount of required computational resources by a factor of hundreds.
Score: 69.22257624495899
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.

Related papers

Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z)
Higher Order Transformers: Efficient Attention Mechanism for Tensor Structured Data [10.327160288730125]
Higher-Order Transformers (HOT) are designed to process data with more than two axes, i.e. higher-order tensors. To address the computational challenges associated with high-order tensor attention, we introduce a novel Kronecker factorized attention mechanism. We validate the effectiveness of HOT on two high-dimensional tasks, including multivariate time series forecasting, and 3D medical image classification.
arXiv Detail & Related papers (2024-12-04T00:10:47Z)
Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples. An autoencoder with self-attention layers and optimal transport refines distributional consistency. Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z)
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks [12.992733141210158]
We propose a simple method called adaptive random feature regularization (AdaRand) AdaRand helps the feature extractors of training models to adaptively change the distribution of feature vectors for downstream classification tasks without auxiliary source information and with reasonable computation costs. Our experiments show that AdaRand outperforms the other fine-tuning regularization, which requires auxiliary source information and heavy computation costs.
arXiv Detail & Related papers (2024-03-15T08:26:59Z)
Gradient-Based Spectral Embeddings of Random Dot Product Graphs [7.612218105739107]
In this paper, we show how to better solve the embedding problem of the Random Dot Product Graph (RDPG) We develop a novel feasible optimization method in the resulting manifold. Our open-source algorithm implementations are scalable, and unlike the they are robust to missing edge data and can track slowly, latent positions from streaming graphs.
arXiv Detail & Related papers (2023-07-25T21:09:55Z)
Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z)
Unsupervised Finetuning [80.58625921631506]
We propose two strategies to combine source and target data into unsupervised finetuning. The motivation of the former strategy is to add a small portion of source data back to occupy their pretrained representation space. The motivation of the latter strategy is to increase the data density and help learn more compact representation.
arXiv Detail & Related papers (2021-10-18T17:57:05Z)
Augmented Tensor Decomposition with Stochastic Optimization [46.16865811396394]
Real-world tensor data are usually high-ordered and have large dimensions with millions or billions of entries. It is expensive to decompose the whole tensor with traditional algorithms. This paper proposes augmented tensor decomposition, which effectively incorporates data augmentations to boost downstream classification.
arXiv Detail & Related papers (2021-06-15T06:29:05Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
Channelized Axial Attention for Semantic Segmentation [70.14921019774793]
We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity. Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
arXiv Detail & Related papers (2021-01-19T03:08:03Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.