Related papers: Forward and Inverse models in HCI:Physical simulation and deep learning for inferring 3D finger pose

Forward and Inverse models in HCI:Physical simulation and deep learning for inferring 3D finger pose

URL: http://arxiv.org/abs/2109.03366v1
Date: Tue, 7 Sep 2021 23:11:21 GMT
Title: Forward and Inverse models in HCI:Physical simulation and deep learning for inferring 3D finger pose
Authors: Roderick Murray-Smith, John H. Williamson, Andrew Ramsay, Francesco Tonolini, Simon Rogers, Antoine Loriette
Abstract summary: We use machine learning to develop data-driven models to infer position, pose and sensor readings. We combine a Conditional Variational Autoencoder with domain expertise/models experimentally collected data.
Score: 2.8952292379640636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We outline the role of forward and inverse modelling approaches in the design of human--computer interaction systems. Causal, forward models tend to be easier to specify and simulate, but HCI requires solutions of the inverse problem. We infer finger 3D position $(x,y,z)$ and pose (pitch and yaw) on a mobile device using capacitive sensors which can sense the finger up to 5cm above the screen. We use machine learning to develop data-driven models to infer position, pose and sensor readings, based on training data from: 1. data generated by robots, 2. data from electrostatic simulators 3. human-generated data. Machine learned emulation is used to accelerate the electrostatic simulation performance by a factor of millions. We combine a Conditional Variational Autoencoder with domain expertise/models experimentally collected data. We compare forward and inverse model approaches to direct inference of finger pose. The combination gives the most accurate reported results on inferring 3D position and pose with a capacitive sensor on a mobile device.

Related papers

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos [66.62109400603394]
We introduce Being-H0, a dexterous Vision-Language-Action model trained on large-scale human videos.<n>Our approach centers on physical instruction tuning, a novel training paradigm that combines large-scale VLA pretraining from human videos, physical space alignment for 3D reasoning, and post-training adaptation for robotic tasks.<n>We empirically show the excellence of Being-H0 in hand motion generation and instruction following, and it also scales well with model and data sizes.
arXiv Detail & Related papers (2025-07-21T13:19:09Z)
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions [13.854929222017121]
We tackle the novel problem of predicting 3D hand motion and contact maps (or Interaction Trajectories) given a single RGB view, action text, and a 3D contact point on the object as input. Our approach consists of (1) Interaction Codebook: a VQVAE model to learn a latent codebook of hand poses and contact points, effectively tokenizing interaction trajectories, and (2) Interaction Predictor: a transformer-decoder module to predict the interaction trajectory from test time inputs.
arXiv Detail & Related papers (2025-04-16T17:48:12Z)
Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios [3.30184292168618]
We propose a dataset generation pipeline based on the CARLA simulator for 3D object detection on LiDAR point clouds. We are able to train an object detector on the synthetic data and demonstrate strong generalization capabilities to the KITTI dataset.
arXiv Detail & Related papers (2025-02-20T22:27:42Z)
CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations. We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z)
VR-based generation of photorealistic synthetic data for training hand-object tracking models [0.0]
"blender-hoisynth" is an interactive synthetic data generator based on the Blender software. It is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware. We replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it.
arXiv Detail & Related papers (2024-01-31T14:32:56Z)
Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps [100.72245315180433]
We present a reconfigurable data glove design to capture different modes of human hand-object interactions. The glove operates in three modes for various downstream tasks with distinct features. We evaluate the system's three modes by (i) recording hand gestures and associated forces, (ii) improving manipulation fluency in VR, and (iii) producing realistic simulation effects of various tool uses.
arXiv Detail & Related papers (2023-01-14T05:35:50Z)
Decanus to Legatus: Synthetic training for 2D-3D human pose lifting [26.108023246654646]
We propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus) Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the potential of our framework.
arXiv Detail & Related papers (2022-10-05T13:10:19Z)
Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments. We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges. MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks. We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z)
T3VIP: Transformation-based 3D Video Prediction [49.178585201673364]
We propose a 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts. Our model is fully unsupervised, captures the nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals. To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera.
arXiv Detail & Related papers (2022-09-19T15:01:09Z)
CROMOSim: A Deep Learning-based Cross-modality Inertial Measurement Simulator [7.50015216403068]
Inertial measurement unit (IMU) data has been utilized in monitoring and assessment of human mobility. To mitigate the data scarcity problem, we design CROMOSim, a cross-modality sensor simulator. It simulates high fidelity virtual IMU sensor data from motion capture systems or monocular RGB cameras.
arXiv Detail & Related papers (2022-02-21T22:30:43Z)
Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance. We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces. AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z)
Yet it moves: Learning from Generic Motions to Generate IMU data from YouTube videos [5.008235182488304]
We show how we can train a regression model on generic motions for both accelerometer and gyro signals to generate synthetic IMU data. We demonstrate that systems trained on simulated data generated by our regression model can come to within around 10% of the mean F1 score of a system trained on real sensor data.
arXiv Detail & Related papers (2020-11-23T18:16:46Z)
Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning [52.37106940303246]
We learn a model that maps noisy input hand poses to target virtual poses. The agent is trained in a residual setting by using a model-free hybrid RL+IL approach. We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
arXiv Detail & Related papers (2020-08-07T17:34:28Z)
Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction [123.62341095156611]
Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces. Such features are essential in building flexible models for both computer graphics and computer vision. We present methodology that combines detail-rich implicit functions and parametric representations.
arXiv Detail & Related papers (2020-07-22T13:46:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.