Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in
Temporal Action Localization Tasks
- URL: http://arxiv.org/abs/2211.06023v1
- Date: Fri, 11 Nov 2022 06:27:22 GMT
- Title: Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in
Temporal Action Localization Tasks
- Authors: Hyolim Kang, Hanjung Kim, Joungbin An, Minsu Cho, Seon Joo Kim
- Abstract summary: We introduce Soft-Landing (SoLa) strategy to bridge the transferability gap between the pretrained encoder and the downstream tasks.
Our method effectively alleviates the task discrepancy problem with remarkable computational efficiency.
- Score: 46.94537691205153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal Action Localization (TAL) methods typically operate on top of
feature sequences from a frozen snippet encoder that is pretrained with the
Trimmed Action Classification (TAC) tasks, resulting in a task discrepancy
problem. While existing TAL methods mitigate this issue either by retraining
the encoder with a pretext task or by end-to-end fine-tuning, they commonly
require an overload of high memory and computation. In this work, we introduce
Soft-Landing (SoLa) strategy, an efficient yet effective framework to bridge
the transferability gap between the pretrained encoder and the downstream tasks
by incorporating a light-weight neural network, i.e., a SoLa module, on top of
the frozen encoder. We also propose an unsupervised training scheme for the
SoLa module; it learns with inter-frame Similarity Matching that uses the frame
interval as its supervisory signal, eliminating the need for temporal
annotations. Experimental evaluation on various benchmarks for downstream TAL
tasks shows that our method effectively alleviates the task discrepancy problem
with remarkable computational efficiency.
Related papers
- Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation [31.622109513774635]
We propose a novel approach to the action segmentation task for long, untrimmed videos.
By encoding a temporal consistency prior to a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation.
Our method does not require knowing the action order for a video to attain temporal consistency.
arXiv Detail & Related papers (2024-04-01T22:53:47Z) - Scaling Learning based Policy Optimization for Temporal Tasks via Dropout [4.421486904657393]
We introduce a model-based approach for training feedback controllers for an autonomous agent operating in a highly nonlinear environment.
We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives.
We introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling.
arXiv Detail & Related papers (2024-03-23T12:53:51Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Task Arithmetic with LoRA for Continual Learning [0.0]
We propose a novel method to continually train vision models using low-rank adaptation and task arithmetic.
When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning.
arXiv Detail & Related papers (2023-11-04T15:12:24Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z) - Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding.
Existing TAL methods rely on pre-training a video encoder through action classification supervision.
We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.