Attention improves concentration when learning node embeddings
- URL: http://arxiv.org/abs/2006.06834v1
- Date: Thu, 11 Jun 2020 21:21:12 GMT
- Title: Attention improves concentration when learning node embeddings
- Authors: Matthew Dippel, Adam Kiezun, Tanay Mehta, Ravi Sundaram, Srikanth
Thirumalai, Akshar Varma
- Abstract summary: Given nodes labelled with search query text, we want to predict links to related queries that share products.
Experiments with a range of deep neural architectures show that simple feedforward networks with an attention mechanism perform best for learning embeddings.
We propose an analytically tractable model of query generation, AttEST, that views both products and the query text as vectors embedded in a latent space.
- Score: 1.2233362977312945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of predicting edges in a graph from node attributes
in an e-commerce setting. Specifically, given nodes labelled with search query
text, we want to predict links to related queries that share products.
Experiments with a range of deep neural architectures show that simple
feedforward networks with an attention mechanism perform best for learning
embeddings. The simplicity of these models allows us to explain the performance
of attention.
We propose an analytically tractable model of query generation, AttEST, that
views both products and the query text as vectors embedded in a latent space.
We prove (and empirically validate) that the point-wise mutual information
(PMI) matrix of the AttEST query text embeddings displays a low-rank behavior
analogous to that observed in word embeddings. This low-rank property allows us
to derive a loss function that maximizes the mutual information between related
queries which is used to train an attention network to learn query embeddings.
This AttEST network beats traditional memory-based LSTM architectures by over
20% on F-1 score. We justify this out-performance by showing that the weights
from the attention mechanism correlate strongly with the weights of the best
linear unbiased estimator (BLUE) for the product vectors, and conclude that
attention plays an important role in variance reduction.
Related papers
- Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - AU-aware graph convolutional network for Macro- and Micro-expression
spotting [44.507747407072685]
We propose a graph convolutional-based network, called Action-Unit-aWare Graph Convolutional Network (AUW-GCN)
To inject prior information and to cope with the problem of small datasets, AU-related statistics are encoded into the network.
Our results outperform baseline methods consistently and achieve new SOTA performance in two benchmark datasets.
arXiv Detail & Related papers (2023-03-16T07:00:36Z) - Revisiting Attention Weights as Explanations from an Information
Theoretic Perspective [4.499369811647602]
We show that attention mechanisms have the potential to function as a shortcut to model explanations when they are carefully combined with other model elements.
Our findings indicate that attention mechanisms do have the potential to function as a shortcut to model explanations when they are carefully combined with other model elements.
arXiv Detail & Related papers (2022-10-31T12:53:20Z) - Beyond the Gates of Euclidean Space: Temporal-Discrimination-Fusions and
Attention-based Graph Neural Network for Human Activity Recognition [5.600003119721707]
Human activity recognition (HAR) through wearable devices has received much interest due to its numerous applications in fitness tracking, wellness screening, and supported living.
Traditional deep learning (DL) has set a state of the art performance for HAR domain.
We propose an approach based on Graph Neural Networks (GNNs) for structuring the input representation and exploiting the relations among the samples.
arXiv Detail & Related papers (2022-06-10T03:04:23Z) - Detecting Owner-member Relationship with Graph Convolution Network in
Fisheye Camera System [9.665475078766017]
We propose an innovative relationship prediction method, DeepWORD, by designing a graph convolutional network (GCN)
In the experiments we learned that the proposed method achieved state-of-the-art accuracy and real-time performance.
arXiv Detail & Related papers (2022-01-28T13:12:27Z) - Siamese Attribute-missing Graph Auto-encoder [35.79233150253881]
We propose Siamese Attribute-missing Graph Auto-encoder (SAGA)
First, we entangle the attribute embedding and structure embedding by introducing a siamese network structure to share the parameters learned by both processes.
Second, we introduce a K-nearest neighbor (KNN) and structural constraint enhanced learning mechanism to improve the quality of latent features of the missing attributes.
arXiv Detail & Related papers (2021-12-09T11:21:31Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.