Sequence-to-Sequence Modeling for Action Identification at High Temporal
Resolution
- URL: http://arxiv.org/abs/2111.02521v1
- Date: Wed, 3 Nov 2021 21:06:36 GMT
- Title: Sequence-to-Sequence Modeling for Action Identification at High Temporal
Resolution
- Authors: Aakash Kaku, Kangning Liu, Avinash Parnandi, Haresh Rengaraj
Rajamohan, Kannan Venkataramanan, Anita Venkatesan, Audre Wirtanen, Natasha
Pandit, Heidi Schambra, Carlos Fernandez-Granda
- Abstract summary: We introduce a new action-recognition benchmark that includes subtle short-duration actions labeled at a high temporal resolution.
We show that current state-of-the-art models based on segmentation produce noisy predictions when applied to these data.
We propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques.
- Score: 9.902223920743872
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic action identification from video and kinematic data is an important
machine learning problem with applications ranging from robotics to smart
health. Most existing works focus on identifying coarse actions such as
running, climbing, or cutting a vegetable, which have relatively long
durations. This is an important limitation for applications that require the
identification of subtle motions at high temporal resolution. For example, in
stroke recovery, quantifying rehabilitation dose requires differentiating
motions with sub-second durations. Our goal is to bridge this gap. To this end,
we introduce a large-scale, multimodal dataset, StrokeRehab, as a new
action-recognition benchmark that includes subtle short-duration actions
labeled at a high temporal resolution. These short-duration actions are called
functional primitives, and consist of reaches, transports, repositions,
stabilizations, and idles. The dataset consists of high-quality Inertial
Measurement Unit sensors and video data of 41 stroke-impaired patients
performing activities of daily living like feeding, brushing teeth, etc. We
show that current state-of-the-art models based on segmentation produce noisy
predictions when applied to these data, which often leads to overcounting of
actions. To address this, we propose a novel approach for high-resolution
action identification, inspired by speech-recognition techniques, which is
based on a sequence-to-sequence model that directly predicts the sequence of
actions. This approach outperforms current state-of-the-art methods on the
StrokeRehab dataset, as well as on the standard benchmark datasets 50Salads,
Breakfast, and Jigsaws.
Related papers
- Finding the DeepDream for Time Series: Activation Maximization for Univariate Time Series [10.388704631887496]
We introduce Sequence Dreaming, a technique that adapts Maxim Activationization to analyze sequential information.
We visualize the temporal dynamics and patterns most influential in model decision-making processes.
arXiv Detail & Related papers (2024-08-20T08:09:44Z) - Coherent Temporal Synthesis for Incremental Action Segmentation [42.46228728930902]
This paper presents the first exploration of video data replay techniques for incremental action segmentation.
We propose a Temporally Coherent Action model, which represents actions using a generative model instead of storing individual frames.
In a 10-task incremental setup on the Breakfast dataset, our approach achieves significant increases in accuracy for up to 22% compared to the baselines.
arXiv Detail & Related papers (2024-03-10T06:07:06Z) - Tapestry of Time and Actions: Modeling Human Activity Sequences using
Temporal Point Process Flows [9.571588145356277]
We present ProActive, a framework for modeling the continuous-time distribution of actions in an activity sequence.
ProActive addresses three high-impact problems -- next action prediction, sequence-goal prediction, and end-to-end sequence generation.
arXiv Detail & Related papers (2023-07-13T19:17:54Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - ProActive: Self-Attentive Temporal Point Process Flows for Activity
Sequences [9.571588145356277]
ProActive is a framework for modeling the continuous-time distribution of actions in an activity sequence.
It addresses next action prediction, sequence-goal prediction, and end-to-end sequence generation.
arXiv Detail & Related papers (2022-06-10T16:30:55Z) - AntPivot: Livestream Highlight Detection via Hierarchical Attention
Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem.
We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset [68.8204255655161]
Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
arXiv Detail & Related papers (2020-08-26T14:44:45Z) - Symmetric Dilated Convolution for Surgical Gesture Recognition [10.699258974625073]
We propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures.
We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns.
We validate our approach on a fundamental robotic suturing task from the JIGSAWS dataset.
arXiv Detail & Related papers (2020-07-13T13:34:48Z) - MS-TCN++: Multi-Stage Temporal Convolutional Network for Action
Segmentation [87.16030562892537]
We propose a multi-stage architecture for the temporal action segmentation task.
The first stage generates an initial prediction that is refined by the next ones.
Our models achieve state-of-the-art results on three datasets.
arXiv Detail & Related papers (2020-06-16T14:50:47Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.