The Collection of a Human Robot Collaboration Dataset for Cooperative Assembly in Glovebox Environments
- URL: http://arxiv.org/abs/2407.14649v1
- Date: Fri, 19 Jul 2024 19:56:53 GMT
- Title: The Collection of a Human Robot Collaboration Dataset for Cooperative Assembly in Glovebox Environments
- Authors: Shivansh Sharma, Mathew Huang, Sanat Nair, Alan Wen, Christina Petlowany, Juston Moore, Selma Wanna, Mitch Pryor,
- Abstract summary: Industry 4.0 introduced AI as a transformative solution for modernizing manufacturing processes. Its successor, Industry 5.0, envisions humans as collaborators and experts guiding these AI-driven solutions.
New techniques require algorithms capable of safe, real-time identification of human positions in a scene, particularly their hands, during collaborative assembly.
This dataset provides 1200 challenging examples to build applications toward hand and glove segmentation in industrial human collaboration scenarios.
- Score: 2.30069810310356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Industry 4.0 introduced AI as a transformative solution for modernizing manufacturing processes. Its successor, Industry 5.0, envisions humans as collaborators and experts guiding these AI-driven manufacturing solutions. Developing these techniques necessitates algorithms capable of safe, real-time identification of human positions in a scene, particularly their hands, during collaborative assembly. Although substantial efforts have curated datasets for hand segmentation, most focus on residential or commercial domains. Existing datasets targeting industrial settings predominantly rely on synthetic data, which we demonstrate does not effectively transfer to real-world operations. Moreover, these datasets lack uncertainty estimations critical for safe collaboration. Addressing these gaps, we present HAGS: Hand and Glove Segmentation Dataset. This dataset provides 1200 challenging examples to build applications toward hand and glove segmentation in industrial human-robot collaboration scenarios as well as assess out-of-distribution images, constructed via green screen augmentations, to determine ML-classifier robustness. We study state-of-the-art, real-time segmentation models to evaluate existing methods. Our dataset and baselines are publicly available: https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/85R7KQ and https://github.com/UTNuclearRoboticsPublic/assembly_glovebox_dataset.
Related papers
- Language Supervised Human Action Recognition with Salient Fusion: Construction Worker Action Recognition as a Use Case [8.26451988845854]
We introduce a novel approach to Human Action Recognition (HAR) based on skeleton and visual cues.
We employ learnable prompts for the language model conditioned on the skeleton modality to optimize feature representation.
We introduce a new dataset tailored for real-world robotic applications in construction sites, featuring visual, skeleton, and depth data modalities.
arXiv Detail & Related papers (2024-10-02T19:10:23Z) - Efficient Data Collection for Robotic Manipulation via Compositional Generalization [70.76782930312746]
We show that policies can compose environmental factors from their data to succeed when encountering unseen factor combinations.
We propose better in-domain data collection strategies that exploit composition.
We provide videos at http://iliad.stanford.edu/robot-data-comp/.
arXiv Detail & Related papers (2024-03-08T07:15:38Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - Exploiting Multimodal Synthetic Data for Egocentric Human-Object
Interaction Detection in an Industrial Scenario [14.188006024550257]
EgoISM-HOI is a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects.
Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data.
To support research in this field, we publicly release the datasets, source code, and pre-trained models at https://iplab.dmi.unict.it/egoism-hoi.
arXiv Detail & Related papers (2023-06-21T09:56:55Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic
segmentation [39.64058995273062]
This work develops a new dataset specifically designed for this use case, named "COVERED"
We provide a benchmark of current state-of-the-art (SOTA) algorithm performance on the dataset and demonstrate a real-time semantic segmentation of a collaborative robot workspace using a multi-LiDAR system.
Our perception pipeline achieves 20Hz throughput with a prediction point accuracy of $>$96% and $>$92% mean intersection over union (mIOU) while maintaining an 8Hz throughput.
arXiv Detail & Related papers (2023-02-24T14:24:58Z) - Towards Multi-User Activity Recognition through Facilitated Training
Data and Deep Learning for Human-Robot Collaboration Applications [2.3274633659223545]
This study proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing.
It is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings.
arXiv Detail & Related papers (2023-02-11T19:27:07Z) - Video-based Pose-Estimation Data as Source for Transfer Learning in
Human Activity Recognition [71.91734471596433]
Human Activity Recognition (HAR) using on-body devices identifies specific human actions in unconstrained environments.
Previous works demonstrated that transfer learning is a good strategy for addressing scenarios with scarce data.
This paper proposes using datasets intended for human-pose estimation as a source for transfer learning.
arXiv Detail & Related papers (2022-12-02T18:19:36Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer
Vision [3.5694949627557846]
We release a human-centric synthetic data generator PeopleSansPeople.
It contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels.
arXiv Detail & Related papers (2021-12-17T02:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.