MILA: Multi-Task Learning from Videos via Efficient Inter-Frame
Attention
- URL: http://arxiv.org/abs/2002.07362v3
- Date: Sun, 10 Oct 2021 23:18:15 GMT
- Title: MILA: Multi-Task Learning from Videos via Efficient Inter-Frame
Attention
- Authors: Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan
Sclaroff, Jayan Eledath, Gerard Medioni
- Abstract summary: We present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA)
Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames.
We also propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features.
- Score: 39.45800143159756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work in multi-task learning has mainly focused on predictions on a
single image. In this work, we present a new approach for multi-task learning
from videos via efficient inter-frame local attention (MILA). Our approach
contains a novel inter-frame attention module which allows learning of
task-specific attention across frames. We embed the attention module in a
``slow-fast'' architecture, where the slower network runs on sparsely sampled
keyframes and the light-weight shallow network runs on non-keyframes at a high
frame rate. We also propose an effective adversarial learning strategy to
encourage the slow and fast network to learn similar features. Our approach
ensures low-latency multi-task learning while maintaining high quality
predictions. Experiments show competitive accuracy compared to state-of-the-art
on two multi-task learning benchmarks while reducing the number of floating
point operations (FLOPs) by up to 70\%. In addition, our attention based
feature propagation method (ILA) outperforms prior work in terms of task
accuracy while also reducing up to 90\% of FLOPs.
Related papers
- Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.
We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.
We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Cross-Task Affinity Learning for Multitask Dense Scene Predictions [5.939164722752263]
Multitask learning (MTL) has become prominent for its ability to predict multiple tasks jointly.
We introduce the Cross-Task Affinity Learning (CTAL) module, a lightweight framework that enhances task refinement in multitask networks.
Our results demonstrate state-of-the-art MTL performance for both CNN and transformer backbones, using significantly fewer parameters than single-task learning.
arXiv Detail & Related papers (2024-01-20T05:31:47Z) - Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream.
This poses great challenges given the high correlation between consecutive video frames.
We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z) - MetaMorphosis: Task-oriented Privacy Cognizant Feature Generation for
Multi-task Learning [6.056197449765416]
This paper proposes a novel deep learning-based privacy-cognizant feature generation process called MetaMorphosis.
We show that MetaMorphosis outperforms recent adversarial learning and universal feature generation methods by guaranteeing privacy requirements.
arXiv Detail & Related papers (2023-05-13T01:59:07Z) - Medusa: Universal Feature Learning via Attentional Multitasking [65.94499390875046]
Recent approaches to multi-task learning have focused on modelling connections between tasks at the decoder level.
We argue that MTL is a stepping stone towards universal feature learning (UFL), which is the ability to learn generic features that can be applied to new tasks without retraining.
We show the effectiveness of Medusa in UFL (+13.18% improvement) while maintaining MTL performance and being 25% more efficient than previous approaches.
arXiv Detail & Related papers (2022-04-12T10:52:28Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition [79.60708268515293]
This paper explores how to train small and efficient networks for action recognition.
We propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively.
Our method can achieve higher performance than state-of-the-art methods with the same backbone.
arXiv Detail & Related papers (2020-09-15T07:29:57Z) - Video Moment Retrieval via Natural Language Queries [7.611718124254329]
We propose a novel method for video moment retrieval (VMR) that achieves state of the arts (SOTA) performance on R@1 metrics.
Our model has a simple architecture, which enables faster training and inference while maintaining.
arXiv Detail & Related papers (2020-09-04T22:06:34Z) - Attentive Feature Reuse for Multi Task Meta learning [17.8055398673228]
We develop new algorithms for simultaneous learning of multiple tasks.
We propose an attention mechanism to dynamically specialize the network, at runtime, for each task.
Our method improves performance on new, previously unseen environments.
arXiv Detail & Related papers (2020-06-12T19:33:11Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.