CaSAR: Contact-aware Skeletal Action Recognition
- URL: http://arxiv.org/abs/2309.10001v1
- Date: Sun, 17 Sep 2023 09:42:40 GMT
- Title: CaSAR: Contact-aware Skeletal Action Recognition
- Authors: Junan Lin, Zhichao Sun, Enjie Cao, Taein Kwon, Mahdi Rad, Marc
Pollefeys
- Abstract summary: We present a new framework called Contact-aware Skeletal Action Recognition (CaSAR)
CaSAR uses novel representations of hand-object interaction that encompass spatial information.
Our framework is able to learn how the hands touch or stay away from the objects for each frame of the action sequence, and use this information to predict the action class.
- Score: 47.249908147135855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skeletal Action recognition from an egocentric view is important for
applications such as interfaces in AR/VR glasses and human-robot interaction,
where the device has limited resources. Most of the existing skeletal action
recognition approaches use 3D coordinates of hand joints and 8-corner
rectangular bounding boxes of objects as inputs, but they do not capture how
the hands and objects interact with each other within the spatial context. In
this paper, we present a new framework called Contact-aware Skeletal Action
Recognition (CaSAR). It uses novel representations of hand-object interaction
that encompass spatial information: 1) contact points where the hand joints
meet the objects, 2) distant points where the hand joints are far away from the
object and nearly not involved in the current action. Our framework is able to
learn how the hands touch or stay away from the objects for each frame of the
action sequence, and use this information to predict the action class. We
demonstrate that our approach achieves the state-of-the-art accuracy of 91.3%
and 98.4% on two public datasets, H2O and FPHA, respectively.
Related papers
- BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects [70.20706475051347]
BimArt is a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects.
We first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation.
The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation.
arXiv Detail & Related papers (2024-12-06T14:23:56Z) - GEARS: Local Geometry-aware Hand-object Interaction Synthesis [38.75942505771009]
We introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions.
As an important step towards mitigating the learning complexity, we transform the points from global frame to template hand frame and use a shared module to process sensor features of each individual joint.
This is followed by a perceptual-temporal transformer network aimed at capturing correlation among the joints in different dimensions.
arXiv Detail & Related papers (2024-04-02T09:18:52Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation [68.80339307258835]
ARCTIC is a dataset of two hands that dexterously manipulate objects.
It contains 2.1M video frames paired with accurate 3D hand meshes and detailed, dynamic contact information.
arXiv Detail & Related papers (2022-04-28T17:23:59Z) - Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of
Articulated Objects [73.23249640099516]
We learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views.
Our insight is that adjacent parts that move relative to each other must be connected by a joint.
We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
arXiv Detail & Related papers (2021-12-21T16:37:48Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.