Forward and Inverse models in HCI:Physical simulation and deep learning
for inferring 3D finger pose
- URL: http://arxiv.org/abs/2109.03366v1
- Date: Tue, 7 Sep 2021 23:11:21 GMT
- Title: Forward and Inverse models in HCI:Physical simulation and deep learning
for inferring 3D finger pose
- Authors: Roderick Murray-Smith, John H. Williamson, Andrew Ramsay, Francesco
Tonolini, Simon Rogers, Antoine Loriette
- Abstract summary: We use machine learning to develop data-driven models to infer position, pose and sensor readings.
We combine a Conditional Variational Autoencoder with domain expertise/models experimentally collected data.
- Score: 2.8952292379640636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We outline the role of forward and inverse modelling approaches in the design
of human--computer interaction systems. Causal, forward models tend to be
easier to specify and simulate, but HCI requires solutions of the inverse
problem. We infer finger 3D position $(x,y,z)$ and pose (pitch and yaw) on a
mobile device using capacitive sensors which can sense the finger up to 5cm
above the screen. We use machine learning to develop data-driven models to
infer position, pose and sensor readings, based on training data from: 1. data
generated by robots, 2. data from electrostatic simulators 3. human-generated
data. Machine learned emulation is used to accelerate the electrostatic
simulation performance by a factor of millions. We combine a Conditional
Variational Autoencoder with domain expertise/models experimentally collected
data. We compare forward and inverse model approaches to direct inference of
finger pose. The combination gives the most accurate reported results on
inferring 3D position and pose with a capacitive sensor on a mobile device.
Related papers
- CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images.
Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations.
We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z) - VR-based generation of photorealistic synthetic data for training
hand-object tracking models [0.0]
"blender-hoisynth" is an interactive synthetic data generator based on the Blender software.
It is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware.
We replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it.
arXiv Detail & Related papers (2024-01-31T14:32:56Z) - Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps [100.72245315180433]
We present a reconfigurable data glove design to capture different modes of human hand-object interactions.
The glove operates in three modes for various downstream tasks with distinct features.
We evaluate the system's three modes by (i) recording hand gestures and associated forces, (ii) improving manipulation fluency in VR, and (iii) producing realistic simulation effects of various tool uses.
arXiv Detail & Related papers (2023-01-14T05:35:50Z) - Decanus to Legatus: Synthetic training for 2D-3D human pose lifting [26.108023246654646]
We propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus)
Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the potential of our framework.
arXiv Detail & Related papers (2022-10-05T13:10:19Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - T3VIP: Transformation-based 3D Video Prediction [49.178585201673364]
We propose a 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts.
Our model is fully unsupervised, captures the nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals.
To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera.
arXiv Detail & Related papers (2022-09-19T15:01:09Z) - CROMOSim: A Deep Learning-based Cross-modality Inertial Measurement
Simulator [7.50015216403068]
Inertial measurement unit (IMU) data has been utilized in monitoring and assessment of human mobility.
To mitigate the data scarcity problem, we design CROMOSim, a cross-modality sensor simulator.
It simulates high fidelity virtual IMU sensor data from motion capture systems or monocular RGB cameras.
arXiv Detail & Related papers (2022-02-21T22:30:43Z) - Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D
Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance.
We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces.
AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z) - Yet it moves: Learning from Generic Motions to Generate IMU data from
YouTube videos [5.008235182488304]
We show how we can train a regression model on generic motions for both accelerometer and gyro signals to generate synthetic IMU data.
We demonstrate that systems trained on simulated data generated by our regression model can come to within around 10% of the mean F1 score of a system trained on real sensor data.
arXiv Detail & Related papers (2020-11-23T18:16:46Z) - Physics-Based Dexterous Manipulations with Estimated Hand Poses and
Residual Reinforcement Learning [52.37106940303246]
We learn a model that maps noisy input hand poses to target virtual poses.
The agent is trained in a residual setting by using a model-free hybrid RL+IL approach.
We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
arXiv Detail & Related papers (2020-08-07T17:34:28Z) - Combining Implicit Function Learning and Parametric Models for 3D Human
Reconstruction [123.62341095156611]
Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces.
Such features are essential in building flexible models for both computer graphics and computer vision.
We present methodology that combines detail-rich implicit functions and parametric representations.
arXiv Detail & Related papers (2020-07-22T13:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.