On the Learning Dynamics of Attention Networks
- URL: http://arxiv.org/abs/2307.13421v5
- Date: Thu, 12 Oct 2023 04:58:05 GMT
- Title: On the Learning Dynamics of Attention Networks
- Authors: Rahul Vashisht and Harish G. Ramaswamy
- Abstract summary: Attention models are learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention.
We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent.
We propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets.
- Score: 0.7614628596146599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention models are typically learned by optimizing one of three standard
loss functions that are variously called -- soft attention, hard attention, and
latent variable marginal likelihood (LVML) attention. All three paradigms are
motivated by the same goal of finding two models -- a `focus' model that
`selects' the right \textit{segment} of the input and a `classification' model
that processes the selected segment into the target label. However, they differ
significantly in the way the selected segments are aggregated, resulting in
distinct dynamics and final results. We observe a unique signature of models
learned using these paradigms and explain this as a consequence of the
evolution of the classification model under gradient descent when the focus
model is fixed. We also analyze these paradigms in a simple setting and derive
closed-form expressions for the parameter trajectory under gradient flow. With
the soft attention loss, the focus model improves quickly at initialization and
splutters later on. On the other hand, hard attention loss behaves in the
opposite fashion. Based on our observations, we propose a simple hybrid
approach that combines the advantages of the different loss functions and
demonstrates it on a collection of semi-synthetic and real-world datasets
Related papers
- Data-Driven Approaches for Modelling Target Behaviour [1.5495593104596401]
The performance of tracking algorithms depends on the chosen model assumptions regarding the target dynamics.
This paper provides a comparative study between three different methods that use machine learning to describe the underlying object motion.
arXiv Detail & Related papers (2024-10-14T14:18:27Z) - Dynamic Feature Learning and Matching for Class-Incremental Learning [20.432575325147894]
Class-incremental learning (CIL) has emerged as a means to learn new classes without catastrophic forgetting of previous classes.
We propose the Dynamic Feature Learning and Matching (DFLM) model in this paper.
Our proposed model achieves significant performance improvements over existing methods.
arXiv Detail & Related papers (2024-05-14T12:17:19Z) - Separating common from salient patterns with Contrastive Representation
Learning [2.250968907999846]
Contrastive Analysis aims at separating common factors of variation between two datasets.
Current models based on Variational Auto-Encoders have shown poor performance in learning semantically-expressive representations.
We propose to leverage the ability of Contrastive Learning to learn semantically expressive representations well adapted for Contrastive Analysis.
arXiv Detail & Related papers (2024-02-19T08:17:13Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - Meta-tuning Loss Functions and Data Augmentation for Few-shot Object
Detection [7.262048441360132]
Few-shot object detection is an emerging topic in the area of few-shot learning and object detection.
We propose a training scheme that allows learning inductive biases that can boost few-shot detection.
The proposed approach yields interpretable loss functions, as opposed to highly parametric and complex few-shot meta-models.
arXiv Detail & Related papers (2023-04-24T15:14:16Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - ContraFeat: Contrasting Deep Features for Semantic Discovery [102.4163768995288]
StyleGAN has shown strong potential for disentangled semantic control.
Existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results.
We propose a model that automates this process and achieves state-of-the-art semantic discovery performance.
arXiv Detail & Related papers (2022-12-14T15:22:13Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - Unleashing the Power of Contrastive Self-Supervised Visual Models via
Contrast-Regularized Fine-Tuning [94.35586521144117]
We investigate whether applying contrastive learning to fine-tuning would bring further benefits.
We propose Contrast-regularized tuning (Core-tuning), a novel approach for fine-tuning contrastive self-supervised visual models.
arXiv Detail & Related papers (2021-02-12T16:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.