Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic
Action Segmentation within Complex Human Assemblies
- URL: http://arxiv.org/abs/2211.13694v1
- Date: Thu, 24 Nov 2022 16:19:22 GMT
- Title: Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic
Action Segmentation within Complex Human Assemblies
- Authors: Matthew Kent Myers, Nick Wright, Stephen McGough, Nicholas Martin
- Abstract summary: We present a novel hand location guided high resolution feature enhanced model.
We also propose a simple yet effective method of deploying offline trained action recognition models for real time action segmentation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the rapid temporal and fine-grained nature of complex human assembly
atomic actions, traditional action segmentation approaches requiring the
spatial (and often temporal) down sampling of video frames often loose vital
fine-grained spatial and temporal information required for accurate
classification within the manufacturing domain. In order to fully utilise
higher resolution video data (often collected within the manufacturing domain)
and facilitate real time accurate action segmentation - required for human
robot collaboration - we present a novel hand location guided high resolution
feature enhanced model. We also propose a simple yet effective method of
deploying offline trained action recognition models for real time action
segmentation on temporally short fine-grained actions, through the use of
surround sampling while training and temporally aware label cleaning at
inference. We evaluate our model on a novel action segmentation dataset
containing 24 (+background) atomic actions from video data of a real world
robotics assembly production line. Showing both high resolution hand features
as well as traditional frame wide features improve fine-grained atomic action
classification, and that though temporally aware label clearing our model is
capable of surpassing similar encoder/decoder methods, while allowing for real
time classification.
Related papers
- Coherent Temporal Synthesis for Incremental Action Segmentation [42.46228728930902]
This paper presents the first exploration of video data replay techniques for incremental action segmentation.
We propose a Temporally Coherent Action model, which represents actions using a generative model instead of storing individual frames.
In a 10-task incremental setup on the Breakfast dataset, our approach achieves significant increases in accuracy for up to 22% compared to the baselines.
arXiv Detail & Related papers (2024-03-10T06:07:06Z) - Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - Sequence-to-Sequence Modeling for Action Identification at High Temporal
Resolution [9.902223920743872]
We introduce a new action-recognition benchmark that includes subtle short-duration actions labeled at a high temporal resolution.
We show that current state-of-the-art models based on segmentation produce noisy predictions when applied to these data.
We propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques.
arXiv Detail & Related papers (2021-11-03T21:06:36Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Semi-Supervised Few-Shot Atomic Action Recognition [59.587738451616495]
We propose a novel model for semi-supervised few-shot atomic action recognition.
Our model features unsupervised and contrastive video embedding, loose action alignment, multi-head feature comparison, and attention-based aggregation.
Experiments show that our model can attain high accuracy on representative atomic action datasets outperforming their respective state-of-the-art classification accuracy in full supervision setting.
arXiv Detail & Related papers (2020-11-17T03:59:05Z) - Memory Group Sampling Based Online Action Recognition Using Kinetic
Skeleton Features [4.674689979981502]
We propose two core ideas to handle the online action recognition problem.
First, we combine the spatial and temporal skeleton features to depict the actions.
Second, we propose a memory group sampling method to combine the previous action frames and current action frames.
Third, an improved 1D CNN network is employed for training and testing using the features from sampled frames.
arXiv Detail & Related papers (2020-11-01T16:43:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.