Coneheads: Hierarchy Aware Attention
- URL: http://arxiv.org/abs/2306.00392v2
- Date: Mon, 4 Dec 2023 01:40:15 GMT
- Title: Coneheads: Hierarchy Aware Attention
- Authors: Albert Tseng, Tao Yu, Toni J.B. Liu, Christopher De Sa
- Abstract summary: We introduce cone attention, a drop-in replacement for dot product attention.
Cone attention associates two points by the depth of their lowest common ancestor in a hierarchy defined by hyperbolic cones.
We show that it improves task-level performance over dot product attention and other baselines.
- Score: 40.685504511826885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention networks such as transformers have achieved state-of-the-art
performance in many domains. These networks rely heavily on the dot product
attention operator, which computes the similarity between two points by taking
their inner product. However, the inner product does not explicitly model the
complex structural properties of real world datasets, such as hierarchies
between data points. To remedy this, we introduce cone attention, a drop-in
replacement for dot product attention based on hyperbolic entailment cones.
Cone attention associates two points by the depth of their lowest common
ancestor in a hierarchy defined by hyperbolic cones, which intuitively measures
the divergence of two points and gives a hierarchy aware similarity score. We
test cone attention on a wide variety of models and tasks and show that it
improves task-level performance over dot product attention and other baselines,
and is able to match dot-product attention with significantly fewer parameters.
Our results suggest that cone attention is an effective way to capture
hierarchical relationships when calculating attention.
Related papers
- Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Integrative Feature and Cost Aggregation with Transformers for Dense
Correspondence [63.868905184847954]
The current state-of-the-art are Transformer-based approaches that focus on either feature descriptors or cost volume aggregation.
We propose a novel Transformer-based network that interleaves both forms of aggregations in a way that exploits their complementary information.
We evaluate the effectiveness of the proposed method on dense matching tasks and achieve state-of-the-art performance on all the major benchmarks.
arXiv Detail & Related papers (2022-09-19T03:33:35Z) - Cell Attention Networks [25.72671436731666]
We introduce Cell Attention Networks (CANs), a neural architecture operating on data defined over the vertices of a graph.
CANs exploit the lower and upper neighborhoods, as encoded in the cell complex, to design two independent masked self-attention mechanisms.
The experimental results show that CAN is a low complexity strategy that compares favorably with state of the art results on graph-based learning tasks.
arXiv Detail & Related papers (2022-09-16T21:57:39Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - An attention-driven hierarchical multi-scale representation for visual
recognition [3.3302293148249125]
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content.
We propose a method to capture high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs)
Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems.
arXiv Detail & Related papers (2021-10-23T09:22:22Z) - Leveraging redundancy in attention with Reuse Transformers [58.614198953733194]
Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way.
A typical Transformer model computes such pairwise attention scores repeatedly for the same sequence.
We propose a novel architecture that reuses attention scores computed in one layer in multiple subsequent layers.
arXiv Detail & Related papers (2021-10-13T16:08:02Z) - Beyond Self-attention: External Attention using Two Linear Layers for
Visual Tasks [34.32609892928909]
We propose a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories.
Our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
arXiv Detail & Related papers (2021-05-05T22:29:52Z) - Attention improves concentration when learning node embeddings [1.2233362977312945]
Given nodes labelled with search query text, we want to predict links to related queries that share products.
Experiments with a range of deep neural architectures show that simple feedforward networks with an attention mechanism perform best for learning embeddings.
We propose an analytically tractable model of query generation, AttEST, that views both products and the query text as vectors embedded in a latent space.
arXiv Detail & Related papers (2020-06-11T21:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.