A Synthetic Dataset for Manometry Recognition in Robotic Applications
- URL: http://arxiv.org/abs/2508.17468v2
- Date: Sat, 11 Oct 2025 16:18:23 GMT
- Title: A Synthetic Dataset for Manometry Recognition in Robotic Applications
- Authors: Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Joao Manoel Herrera Pinheiro, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker,
- Abstract summary: We propose a hybrid data synthesis pipeline that integrates procedural rendering and AI-driven video generation.<n>A YOLO-based detector trained on a composite dataset, combining real and synthetic data, outperformed models trained solely on real images.
- Score: 0.686108371431346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the challenges of data scarcity and high acquisition costs in training robust object detection models for complex industrial environments, such as offshore oil platforms. Data collection in these hazardous settings often limits the development of autonomous inspection systems. To mitigate this issue, we propose a hybrid data synthesis pipeline that integrates procedural rendering and AI-driven video generation. The approach uses BlenderProc to produce photorealistic images with domain randomization and NVIDIA's Cosmos-Predict2 to generate physically consistent video sequences with temporal variation. A YOLO-based detector trained on a composite dataset, combining real and synthetic data, outperformed models trained solely on real images. A 1:1 ratio between real and synthetic samples achieved the highest accuracy. The results demonstrate that synthetic data generation is a viable, cost-effective, and safe strategy for developing reliable perception systems in safety-critical and resource-constrained industrial applications.
Related papers
- Physics Informed Generative AI Enabling Labour Free Segmentation For Microscopy Analysis [3.3176565054468714]
This paper introduces a novel framework for labour-free segmentation that successfully bridges the simulation-to-reality gap.<n>We employ a Cycle-Consistent Generative Adversarial Network (CycleGAN) for unpaired image-to-image translation.<n>A U-Net model, trained exclusively on this synthetic data, demonstrated remarkable generalisation when deployed on unseen experimental images.
arXiv Detail & Related papers (2026-02-02T06:36:06Z) - Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data [53.040873127309766]
We propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning.<n>Our method outperforms existing models on both in-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2025-09-08T17:58:06Z) - SynSpill: Improved Industrial Spill Detection With Synthetic Data [3.297182592932918]
Large-scale Vision-Language Models (VLMs) have transformed general-purpose visual recognition through strong zero-shot capabilities.<n>Their performance degrades significantly in niche, safety-critical domains such as industrial spill detection.<n>We introduce a scalable framework centered on a high-quality synthetic data generation pipeline.
arXiv Detail & Related papers (2025-08-13T20:09:58Z) - A workflow for generating synthetic LiDAR datasets in simulation environments [0.0]
This paper presents a simulation workflow for generating synthetic LiDAR datasets to support autonomous vehicle perception, robotics research, and sensor security analysis.<n>We integrate time-of-flight LiDAR, image sensors, and two dimensional scanners onto a simulated vehicle platform operating within an urban scenario.<n>The study examines potential security vulnerabilities in LiDAR data, such as adversarial point injection and spoofing attacks, and demonstrates how synthetic datasets can facilitate the evaluation of defense strategies.
arXiv Detail & Related papers (2025-06-20T17:56:15Z) - Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training [0.708987965338602]
We propose a novel method for automatically generating annotated synthetic data in Unreal Engine.<n>We demonstrate that synthetic datasets can achieve performance comparable to that of real-world datasets.<n>This is the first application of synthetic data for training object detection algorithms in robot soccer.
arXiv Detail & Related papers (2025-06-05T14:37:40Z) - Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Map [50.21082069320818]
We propose a novel diffusion-based pipeline for generating high-fidelity industrial datasets with minimal supervision.<n>Our approach conditions the diffusion model on enriched bounding box representations to produce precise segmentation masks.<n>Results demonstrate that diffusion-based synthesis can bridge the gap between artificial and real-world industrial data.
arXiv Detail & Related papers (2025-05-06T15:21:36Z) - Evaluating the Impact of Synthetic Data on Object Detection Tasks in Autonomous Driving [0.0]
We compare 2D and 3D object detection tasks trained on real, synthetic, and mixed datasets.<n>Our findings demonstrate that the use of a combination of real and synthetic data improves the robustness and generalization of object detection models.
arXiv Detail & Related papers (2025-03-12T20:13:33Z) - Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery [3.9845810840390734]
In precision agriculture, the scarcity of labeled data poses unique challenges for training machine learning models.<n>We propose a novel system for generating realistic synthetic data to address these challenges.<n>We demonstrate considerable performance improvements in training a state-of-the-art detector by applying our method to table grapes cultivation.
arXiv Detail & Related papers (2025-02-25T16:13:49Z) - Synthetica: Large Scale Synthetic Data for Robot Perception [21.415878105900187]
We present Synthetica, a method for large-scale synthetic data generation for training robust state estimators.
This paper focuses on the task of object detection, an important problem which can serve as the front-end for most state estimation problems.
We leverage data from a ray-tracing, generating 2.7 million images, to train highly accurate real-time detection transformers.
We demonstrate state-of-the-art performance on the task of object detection while having detectors that run at 50-100Hz which is 9 times faster than the prior SOTA.
arXiv Detail & Related papers (2024-10-28T15:50:56Z) - Towards Realistic Data Generation for Real-World Super-Resolution [58.99206459754721]
RealDGen is an unsupervised learning data generation framework designed for real-world super-resolution.<n>We develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model.<n>Experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations.
arXiv Detail & Related papers (2024-06-11T13:34:57Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration [5.089356301032639]
We focus on its role in safety-critical applications, introducing four types of instance-level fidelity.
The aim is to ensure that applying testing on synthetic data can reveal real-world safety issues.
arXiv Detail & Related papers (2024-02-10T19:45:40Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception [62.71374902455154]
We leverage recent advancements in neural rendering to improve static and dynamic novelview UAV-based image rendering.
We demonstrate a considerable performance boost when a state-of-the-art detection model is optimized primarily on hybrid sets of real and synthetic data.
arXiv Detail & Related papers (2023-10-25T00:20:37Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.