Related papers: SynthRender and IRIS: Open-Source Framework and Dataset for Bidirectional Sim-Real Transfer in Industrial Object Perception

SynthRender and IRIS: Open-Source Framework and Dataset for Bidirectional Sim-Real Transfer in Industrial Object Perception

URL: http://arxiv.org/abs/2602.21141v1
Date: Tue, 24 Feb 2026 17:42:34 GMT
Title: SynthRender and IRIS: Open-Source Framework and Dataset for Bidirectional Sim-Real Transfer in Industrial Object Perception
Authors: Jose Moises Araya-Martinez, Thushar Tom, Adrián Sanchis Reig, Pablo Rey Valiente, Jens Lambrecht, Jörg Krüger,
Abstract summary: We release SynthRender, an open source framework for synthetic image generation with Guided Domain Randomization capabilities.<n>We also benchmark recent Reality-to-Simulation techniques for 3D asset creation from 2D images of real parts.<n>These synthetic assets provide low-overhead, transferable data even for parts lacking 3D files.
Score: 5.278929538141005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object perception is fundamental for tasks such as robotic material handling and quality inspection. However, modern supervised deep-learning perception models require large datasets for robust automation under semi-uncontrolled conditions. The cost of acquiring and annotating such data for proprietary parts is a major barrier for widespread deployment. In this context, we release SynthRender, an open source framework for synthetic image generation with Guided Domain Randomization capabilities. Furthermore, we benchmark recent Reality-to-Simulation techniques for 3D asset creation from 2D images of real parts. Combined with Domain Randomization, these synthetic assets provide low-overhead, transferable data even for parts lacking 3D files. We also introduce IRIS, the Industrial Real-Sim Imagery Set, containing 32 categories with diverse textures, intra-class variation, strong inter-class similarities and about 20,000 labels. Ablations on multiple benchmarks outline guidelines for efficient data generation with SynthRender. Our method surpasses existing approaches, achieving 99.1% mAP@50 on a public robotics dataset, 98.3% mAP@50 on an automotive benchmark, and 95.3% mAP@50 on IRIS.

Related papers

URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model [76.08429266631823]
We propose an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM)<n>URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction.<n> Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches.
arXiv Detail & Related papers (2025-11-02T13:45:51Z)
BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining [2.400704807305413]
Zero-shot 3D object classification is crucial for real-world applications like autonomous driving.<n>It is often hindered by a significant domain gap between the synthetic data used for training and the sparse, noisy LiDAR scans encountered in the real-world.<n>We introduce BlendCLIP, a multimodal pretraining framework that bridges this synthetic-to-real gap by strategically combining the strengths of both domains.
arXiv Detail & Related papers (2025-10-21T03:08:27Z)
Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data [53.040873127309766]
We propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning.<n>Our method outperforms existing models on both in-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2025-09-08T17:58:06Z)
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation [52.2244588424002]
We present RoboTwin 2.0, a scalable framework for automated, large-scale generation of diverse and realistic data.<n>At its core is RoboTwin-OD, an object library of 731 instances across 147 categories with semantic and manipulation-relevant annotations.<n>To improve sim-to-real transfer, RoboTwin 2.0 applies structured domain randomization along five axes.
arXiv Detail & Related papers (2025-06-22T16:26:53Z)
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets [90.99212668875971]
Step1X-3D is an open framework addressing challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation.<n>We present a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with a diffusion-based texture synthesis module.<n> Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods.
arXiv Detail & Related papers (2025-05-12T16:56:30Z)
MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans [76.39726619818896]
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to support skill acquisition, sim-to-real transfer, and generalization.<n>Existing datasets demonstrate that this process heavily relies on artist-driven designs.<n>We present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans.
arXiv Detail & Related papers (2025-05-05T06:13:25Z)
Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios [3.30184292168618]
We propose a dataset generation pipeline based on the CARLA simulator for 3D object detection on LiDAR point clouds.<n>We are able to train an object detector on the synthetic data and demonstrate strong generalization capabilities to the KITTI dataset.
arXiv Detail & Related papers (2025-02-20T22:27:42Z)
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data [59.88075377088134]
We propose scaling up 3D scene reconstruction by training with synthesized data.<n>At the core of our work is Mega Synth, a procedurally generated 3D dataset comprising 700K scenes.<n>Experiment results show that joint training or pre-training with Mega Synth improves reconstruction quality by 1.2 to 1.8 dB PSNR across diverse image domains.
arXiv Detail & Related papers (2024-12-18T18:59:38Z)
Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks [47.07188762367792]
We present ARSim, a framework designed to enhance real multi-view image data with 3D synthetic objects of interest. We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it. The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.
arXiv Detail & Related papers (2024-03-22T17:49:11Z)
Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection [4.327763441385371]
In this paper, we investigate the sim-to-real generalization performance of standard object detectors on the complex industrial application of terminal strip object detection. We manually annotated 300 real images of terminal strips for the evaluation. The results show the cruciality of the objects of interest to have the same scale in either domain.
arXiv Detail & Related papers (2024-03-06T18:33:27Z)
SynTable: A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop Scenes [2.8661021832561757]
We present SynTable, a Python-based dataset generator built using NVIDIA's Isaac Sim Replicator Composer.<n>Our dataset generation tool can render complex 3D scenes containing object meshes, materials, textures, lighting, and backgrounds.<n>We demonstrate the use of a sample dataset generated using SynTable for training a state-of-the-art model, UOAIS-Net.
arXiv Detail & Related papers (2023-07-14T13:24:42Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.