Related papers: Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks

Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks

URL: http://arxiv.org/abs/2211.06023v1
Date: Fri, 11 Nov 2022 06:27:22 GMT
Title: Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
Authors: Hyolim Kang, Hanjung Kim, Joungbin An, Minsu Cho, Seon Joo Kim
Abstract summary: We introduce Soft-Landing (SoLa) strategy to bridge the transferability gap between the pretrained encoder and the downstream tasks. Our method effectively alleviates the task discrepancy problem with remarkable computational efficiency.
Score: 46.94537691205153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Temporal Action Localization (TAL) methods typically operate on top of feature sequences from a frozen snippet encoder that is pretrained with the Trimmed Action Classification (TAC) tasks, resulting in a task discrepancy problem. While existing TAL methods mitigate this issue either by retraining the encoder with a pretext task or by end-to-end fine-tuning, they commonly require an overload of high memory and computation. In this work, we introduce Soft-Landing (SoLa) strategy, an efficient yet effective framework to bridge the transferability gap between the pretrained encoder and the downstream tasks by incorporating a light-weight neural network, i.e., a SoLa module, on top of the frozen encoder. We also propose an unsupervised training scheme for the SoLa module; it learns with inter-frame Similarity Matching that uses the frame interval as its supervisory signal, eliminating the need for temporal annotations. Experimental evaluation on various benchmarks for downstream TAL tasks shows that our method effectively alleviates the task discrepancy problem with remarkable computational efficiency.

Related papers

Faster Diffusion Action Segmentation [9.868244939496678]
Temporal Action Classification (TAS) is an essential task in video analysis, aiming to segment and classify continuous frames into distinct action segments. Recent advances in diffusion models have demonstrated substantial success in TAS tasks due to their stable training process and high-quality generation capabilities. We propose EffiDiffAct, an efficient and high-performance TAS algorithm.
arXiv Detail & Related papers (2024-08-04T13:23:18Z)
BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation [48.08416841005715]
We introduce a novel keypose-conditioned consistency policy tailored for bimanual manipulation. It is a hierarchical imitation learning framework that consists of a high-level keypose predictor and a low-level trajectory generator. Simulated and real-world experimental results demonstrate that the proposed approach surpasses baseline methods in terms of success rate and operational efficiency.
arXiv Detail & Related papers (2024-06-14T14:49:12Z)
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation [31.622109513774635]
We propose a novel approach to the action segmentation task for long, untrimmed videos. By encoding a temporal consistency prior to a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation. Our method does not require knowing the action order for a video to attain temporal consistency.
arXiv Detail & Related papers (2024-04-01T22:53:47Z)
Scaling Learning based Policy Optimization for Temporal Logic Tasks by Controller Network Dropout [4.421486904657393]
We introduce a model-based approach for training feedback controllers for an autonomous agent operating in a highly nonlinear environment. We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives. We introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling.
arXiv Detail & Related papers (2024-03-23T12:53:51Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks. Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer. ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z)
ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries. We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations. Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z)
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.