Exploiting Multimodal Synthetic Data for Egocentric Human-Object
Interaction Detection in an Industrial Scenario
- URL: http://arxiv.org/abs/2306.12152v2
- Date: Mon, 11 Mar 2024 10:37:00 GMT
- Title: Exploiting Multimodal Synthetic Data for Egocentric Human-Object
Interaction Detection in an Industrial Scenario
- Authors: Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria
Farinella
- Abstract summary: EgoISM-HOI is a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects.
Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data.
To support research in this field, we publicly release the datasets, source code, and pre-trained models at https://iplab.dmi.unict.it/egoism-hoi.
- Score: 14.188006024550257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we tackle the problem of Egocentric Human-Object Interaction
(EHOI) detection in an industrial setting. To overcome the lack of public
datasets in this context, we propose a pipeline and a tool for generating
synthetic images of EHOIs paired with several annotations and data signals
(e.g., depth maps or segmentation masks). Using the proposed pipeline, we
present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images
in an industrial environment with rich annotations of hands and objects. To
demonstrate the utility and effectiveness of synthetic EHOI data produced by
the proposed tool, we designed a new method that predicts and combines
different multimodal signals to detect EHOIs in RGB images. Our study shows
that exploiting synthetic data to pre-train the proposed method significantly
improves performance when tested on real-world data. Moreover, to fully
understand the usefulness of our method, we conducted an in-depth analysis in
which we compared and highlighted the superiority of the proposed approach over
different state-of-the-art class-agnostic methods. To support research in this
field, we publicly release the datasets, source code, and pre-trained models at
https://iplab.dmi.unict.it/egoism-hoi.
Related papers
- SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture [0.09999629695552192]
We propose a dual diffusion model architecture for synthesizing realistic annotated agricultural data, without any human intervention.
We employ super-resolution to enhance the phenotypic characteristics of the synthesized images and their coherence with the corresponding generated masks.
The results show the efficacy of the proposed methodology for addressing data scarcity for semantic segmentation tasks.
arXiv Detail & Related papers (2024-11-05T20:42:23Z) - MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis [35.07663680944459]
Deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks.
The success of deep learning is all attributed to the training on large-scale datasets.
In order to solve the problem of large amount of data, some researchers put forward the method of data distillation.
arXiv Detail & Related papers (2024-08-05T14:16:54Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? [12.987587227876565]
We investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection.
By leveraging only 10% of real labeled data, we achieve improvements in Overall AP compared to baselines trained exclusively on real data.
arXiv Detail & Related papers (2023-12-05T11:29:00Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - VALERIE22 -- A photorealistic, richly metadata annotated dataset of
urban environments [5.439020425819001]
The VALERIE tool pipeline is a synthetic data generator developed to contribute to the understanding of domain-specific factors.
The VALERIE22 dataset was generated with the VALERIE procedural tools pipeline providing a photorealistic sensor simulation.
The dataset provides a uniquely rich set of metadata, allowing extraction of specific scene and semantic features.
arXiv Detail & Related papers (2023-08-18T15:44:45Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Egocentric Human-Object Interaction Detection Exploiting Synthetic Data [19.220651860718892]
We consider the problem of detecting Egocentric HumanObject Interactions (EHOIs) in industrial contexts.
We propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection.
arXiv Detail & Related papers (2022-04-14T15:59:15Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Learning to Segment Human Body Parts with Synthetically Trained Deep
Convolutional Networks [58.0240970093372]
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data.
The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts.
arXiv Detail & Related papers (2021-02-02T12:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.