MotionInput v2.0 supporting DirectX: A modular library of open-source
gesture-based machine learning and computer vision methods for interacting
and controlling existing software with a webcam
- URL: http://arxiv.org/abs/2108.04357v1
- Date: Tue, 10 Aug 2021 08:23:21 GMT
- Title: MotionInput v2.0 supporting DirectX: A modular library of open-source
gesture-based machine learning and computer vision methods for interacting
and controlling existing software with a webcam
- Authors: Ashild Kummen, Guanlin Li, Ali Hassan, Teodora Ganeva, Qianying Lu,
Robert Shaw, Chenuka Ratwatte, Yang Zou, Lu Han, Emil Almazov, Sheena Visram,
Andrew Taylor, Neil J Sebire, Lee Stott, Yvonne Rogers, Graham Roberts, Dean
Mohamedally
- Abstract summary: MotionInput v2.0 maps human motion gestures to input operations for existing applications and games.
Three use case areas assisted the development of the modules: creativity software, office and clinical software, and gaming software.
- Score: 11.120698968989108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Touchless computer interaction has become an important consideration during
the COVID-19 pandemic period. Despite progress in machine learning and computer
vision that allows for advanced gesture recognition, an integrated collection
of such open-source methods and a user-customisable approach to utilising them
in a low-cost solution for touchless interaction in existing software is still
missing. In this paper, we introduce the MotionInput v2.0 application. This
application utilises published open-source libraries and additional gesture
definitions developed to take the video stream from a standard RGB webcam as
input. It then maps human motion gestures to input operations for existing
applications and games. The user can choose their own preferred way of
interacting from a series of motion types, including single and bi-modal hand
gesturing, full-body repetitive or extremities-based exercises, head and facial
movements, eye tracking, and combinations of the above. We also introduce a
series of bespoke gesture recognition classifications as DirectInput triggers,
including gestures for idle states, auto calibration, depth capture from a 2D
RGB webcam stream and tracking of facial motions such as mouth motions,
winking, and head direction with rotation. Three use case areas assisted the
development of the modules: creativity software, office and clinical software,
and gaming software. A collection of open-source libraries has been integrated
and provide a layer of modular gesture mapping on top of existing mouse and
keyboard controls in Windows via DirectX. With ease of access to webcams
integrated into most laptops and desktop computers, touchless computing becomes
more available with MotionInput v2.0, in a federated and locally processed
method.
Related papers
- Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models [19.09048969615117]
We explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs.
Our method can achieve general human motion synthesis for many downstream tasks.
arXiv Detail & Related papers (2024-06-15T21:10:37Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model Agents [35.48323584634582]
We introduce GestureGPT, a free-form hand gesture understanding framework that mimics human gesture understanding procedures.
Our framework leverages multiple Large Language Model agents to manage and synthesize gesture and context information.
We validated our framework offline under two real-world scenarios: smart home control and online video streaming.
arXiv Detail & Related papers (2023-10-19T15:17:34Z) - The Gesture Authoring Space: Authoring Customised Hand Gestures for
Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world.
The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures.
The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z) - Muscle Vision: Real Time Keypoint Based Pose Classification of Physical
Exercises [52.77024349608834]
3D human pose recognition extrapolated from video has advanced to the point of enabling real-time software applications.
We propose a new machine learning pipeline and web interface that performs human pose recognition on a live video feed to detect when common exercises are performed and classify them accordingly.
arXiv Detail & Related papers (2022-03-23T00:55:07Z) - Click to Move: Controlling Video Generation with Sparse Motion [30.437648200928603]
Click to Move (C2M) is a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks.
Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user.
It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input.
arXiv Detail & Related papers (2021-08-19T17:33:13Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - Unmasking Communication Partners: A Low-Cost AI Solution for Digitally
Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD)
Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware.
We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z) - Gestop : Customizable Gesture Control of Computer Systems [0.3553493344868413]
Gestop is a framework that learns to detect gestures from demonstrations and is customizable by end-users.
It enables users to interact in real-time with computers having only RGB cameras, using gestures.
arXiv Detail & Related papers (2020-10-25T19:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.