HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
- URL: http://arxiv.org/abs/2408.06437v1
- Date: Mon, 12 Aug 2024 18:29:48 GMT
- Title: HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
- Authors: Sakib Reza, Yuexi Zhang, Mohsen Moghaddam, Octavia Camps,
- Abstract summary: We introduce the History-Augmented Anchor Transformer (HAT) Framework for OnTAL.
By integrating historical context, our framework enhances the synergy between long-term and short-term information.
We evaluate our model on both procedural egocentric (PREGO) datasets and standard non-PREGO OnTAL datasets.
- Score: 3.187381965457262
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Online video understanding often relies on individual frames, leading to frame-by-frame predictions. Recent advancements such as Online Temporal Action Localization (OnTAL), extend this approach to instance-level predictions. However, existing methods mainly focus on short-term context, neglecting historical information. To address this, we introduce the History-Augmented Anchor Transformer (HAT) Framework for OnTAL. By integrating historical context, our framework enhances the synergy between long-term and short-term information, improving the quality of anchor features crucial for classification and localization. We evaluate our model on both procedural egocentric (PREGO) datasets (EGTEA and EPIC) and standard non-PREGO OnTAL datasets (THUMOS and MUSES). Results show that our model outperforms state-of-the-art approaches significantly on PREGO datasets and achieves comparable or slightly superior performance on non-PREGO datasets, underscoring the importance of leveraging long-term history, especially in procedural and egocentric action scenarios. Code is available at: https://github.com/sakibreza/ECCV24-HAT/
Related papers
- ONSEP: A Novel Online Neural-Symbolic Framework for Event Prediction Based on Large Language Model [10.137013634329582]
We introduce the Online Neural-Symbolic Event Prediction framework.
ONSEP incorporates dynamic causal rule mining and dual history augmented generation.
Our framework demonstrates notable performance enhancements across diverse datasets.
arXiv Detail & Related papers (2024-08-14T22:28:19Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Exploring the Limits of Historical Information for Temporal Knowledge
Graph Extrapolation [59.417443739208146]
We propose a new event forecasting model based on a novel training framework of historical contrastive learning.
CENET learns both the historical and non-historical dependency to distinguish the most potential entities.
We evaluate our proposed model on five benchmark graphs.
arXiv Detail & Related papers (2023-08-29T03:26:38Z) - Span-Selective Linear Attention Transformers for Effective and Robust
Schema-Guided Dialogue State Tracking [7.176787451868171]
We introduce SPLAT, a novel architecture which achieves better generalization and efficiency than prior approaches.
We demonstrate the effectiveness of our model on theGuided Dialogue (SGD) and MultiWOZ datasets.
arXiv Detail & Related papers (2023-06-15T17:59:31Z) - Temporal Knowledge Graph Reasoning with Historical Contrastive Learning [24.492458924487863]
We propose a new event forecasting model called Contrastive Event Network (CENET)
CENET learns both the historical and non-historical dependency to distinguish the most potential entities that can best match the given query.
During the inference process, CENET employs a mask-based strategy to generate the final results.
arXiv Detail & Related papers (2022-11-20T08:32:59Z) - SimOn: A Simple Framework for Online Temporal Action Localization [51.27476730635852]
We propose a framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture.
Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods.
arXiv Detail & Related papers (2022-11-08T04:50:54Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.