Related papers: MotionInput v2.0 supporting DirectX: A modular library of open-source gesture-based machine learning and computer vision methods for interacting and controlling existing software with a webcam

MotionInput v2.0 supporting DirectX: A modular library of open-source gesture-based machine learning and computer vision methods for interacting and controlling existing software with a webcam

URL: http://arxiv.org/abs/2108.04357v1
Date: Tue, 10 Aug 2021 08:23:21 GMT
Title: MotionInput v2.0 supporting DirectX: A modular library of open-source gesture-based machine learning and computer vision methods for interacting and controlling existing software with a webcam
Authors: Ashild Kummen, Guanlin Li, Ali Hassan, Teodora Ganeva, Qianying Lu, Robert Shaw, Chenuka Ratwatte, Yang Zou, Lu Han, Emil Almazov, Sheena Visram, Andrew Taylor, Neil J Sebire, Lee Stott, Yvonne Rogers, Graham Roberts, Dean Mohamedally
Abstract summary: MotionInput v2.0 maps human motion gestures to input operations for existing applications and games. Three use case areas assisted the development of the modules: creativity software, office and clinical software, and gaming software.
Score: 11.120698968989108
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Touchless computer interaction has become an important consideration during the COVID-19 pandemic period. Despite progress in machine learning and computer vision that allows for advanced gesture recognition, an integrated collection of such open-source methods and a user-customisable approach to utilising them in a low-cost solution for touchless interaction in existing software is still missing. In this paper, we introduce the MotionInput v2.0 application. This application utilises published open-source libraries and additional gesture definitions developed to take the video stream from a standard RGB webcam as input. It then maps human motion gestures to input operations for existing applications and games. The user can choose their own preferred way of interacting from a series of motion types, including single and bi-modal hand gesturing, full-body repetitive or extremities-based exercises, head and facial movements, eye tracking, and combinations of the above. We also introduce a series of bespoke gesture recognition classifications as DirectInput triggers, including gestures for idle states, auto calibration, depth capture from a 2D RGB webcam stream and tracking of facial motions such as mouth motions, winking, and head direction with rotation. Three use case areas assisted the development of the modules: creativity software, office and clinical software, and gaming software. A collection of open-source libraries has been integrated and provide a layer of modular gesture mapping on top of existing mouse and keyboard controls in Windows via DirectX. With ease of access to webcams integrated into most laptops and desktop computers, touchless computing becomes more available with MotionInput v2.0, in a federated and locally processed method.

Related papers

ATI: Any Trajectory Instruction for Controllable Video Generation [25.249489701215467]
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion.<n>Our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models.
arXiv Detail & Related papers (2025-05-28T23:49:18Z)
Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs [16.41735119504929]
This work focuses on generating realistic, physically-based human behaviors from multi-modal inputs, which may only partially specify the desired motion. The input may come from a VR controller providing arm motion and body velocity, partial key-point animation, computer vision applied to videos, or even higher-level motion goals. We introduce the Masked Humanoid Controller (MHC), a novel approach that applies multi-objective imitation learning on augmented and selectively masked motion demonstrations.
arXiv Detail & Related papers (2025-02-08T17:02:11Z)
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation [65.74312406211213]
This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis.
arXiv Detail & Related papers (2025-02-06T18:41:04Z)
Extraction Of Cumulative Blobs From Dynamic Gestures [0.0]
Gesture recognition is based on CV technology that allows the computer to interpret human motions as commands. A simple night vision camera can be used as our camera for motion capture. The video stream from the camera is fed into a Raspberry Pi which has a Python program running OpenCV module.
arXiv Detail & Related papers (2025-01-07T18:59:28Z)
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [90.39860012099393]
Sitcom-Crafter is a system for human motion generation in 3D space. Central to the function generation modules is our novel 3D scene-aware human-human interaction module. Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models [19.09048969615117]
We explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs. Our method can achieve general human motion synthesis for many downstream tasks.
arXiv Detail & Related papers (2024-06-15T21:10:37Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model Agents [35.48323584634582]
We introduce GestureGPT, a free-form hand gesture understanding framework that mimics human gesture understanding procedures. Our framework leverages multiple Large Language Model agents to manage and synthesize gesture and context information. We validated our framework offline under two real-world scenarios: smart home control and online video streaming.
arXiv Detail & Related papers (2023-10-19T15:17:34Z)
The Gesture Authoring Space: Authoring Customised Hand Gestures for Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world. The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures. The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z)
Muscle Vision: Real Time Keypoint Based Pose Classification of Physical Exercises [52.77024349608834]
3D human pose recognition extrapolated from video has advanced to the point of enabling real-time software applications. We propose a new machine learning pipeline and web interface that performs human pose recognition on a live video feed to detect when common exercises are performed and classify them accordingly.
arXiv Detail & Related papers (2022-03-23T00:55:07Z)
Click to Move: Controlling Video Generation with Sparse Motion [30.437648200928603]
Click to Move (C2M) is a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input.
arXiv Detail & Related papers (2021-08-19T17:33:13Z)
SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software. Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario. This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z)
Unmasking Communication Partners: A Low-Cost AI Solution for Digitally Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD) Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware. We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z)
Gestop : Customizable Gesture Control of Computer Systems [0.3553493344868413]
Gestop is a framework that learns to detect gestures from demonstrations and is customizable by end-users. It enables users to interact in real-time with computers having only RGB cameras, using gestures.
arXiv Detail & Related papers (2020-10-25T19:13:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.