HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning
- URL: http://arxiv.org/abs/2503.13171v1
- Date: Mon, 17 Mar 2025 13:49:43 GMT
- Title: HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning
- Authors: Wensheng Wang, Ning Tan,
- Abstract summary: HybridGen is an automated framework that integrates Vision-Language Model and hybrid planning.<n>It generates a large volume of training data without requiring specific data formats.<n>In the most challenging task variants, HybridGen achieves significant improvement, reaching a 59.7% average success rate.
- Score: 2.677995462843075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The acquisition of large-scale and diverse demonstration data are essential for improving robotic imitation learning generalization. However, generating such data for complex manipulations is challenging in real-world settings. We introduce HybridGen, an automated framework that integrates Vision-Language Model (VLM) and hybrid planning. HybridGen uses a two-stage pipeline: first, VLM to parse expert demonstrations, decomposing tasks into expert-dependent (object-centric pose transformations for precise control) and plannable segments (synthesizing diverse trajectories via path planning); second, pose transformations substantially expand the first-stage data. Crucially, HybridGen generates a large volume of training data without requiring specific data formats, making it broadly applicable to a wide range of imitation learning algorithms, a characteristic which we also demonstrate empirically across multiple algorithms. Evaluations across seven tasks and their variants demonstrate that agents trained with HybridGen achieve substantial performance and generalization gains, averaging a 5% improvement over state-of-the-art methods. Notably, in the most challenging task variants, HybridGen achieves significant improvement, reaching a 59.7% average success rate, significantly outperforming Mimicgen's 49.5%. These results demonstrating its effectiveness and practicality.
Related papers
- Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations.
Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression.
We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z) - MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation [31.21163360113923]
MM-Gen is a scalable method that generates task-specific, high-quality synthetic text for candidate images.<n>Fine-tuning VLMs with data generated by MM-Gen leads to significant performance gains.<n>Compared to human-curated caption data, MM-Gen achieves up to 1.6x better improvements.
arXiv Detail & Related papers (2025-01-07T21:55:56Z) - Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems.
DTs often struggle to generalize to unseen conditions in data-scarce settings.
In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets.
To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature.
Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z) - Speeding up 6-DoF Grasp Sampling with Quality-Diversity [1.533848041901807]
Quality-Diversity (QD) algorithms optimize a set of solutions to get diverse, high-performing solutions to a given problem.
Experiments conducted on 4 grippers with 2-to-5 fingers on standard objects show that QD outperforms commonly used methods by a large margin.
arXiv Detail & Related papers (2024-03-10T10:58:54Z) - DIG-MILP: a Deep Instance Generator for Mixed-Integer Linear Programming
with Feasibility Guarantee [47.11455377400096]
Mixed-integer linear programming (MILP) stands as a notable NP-hard problem pivotal to numerous crucial industrial applications.
We present DIG-MILP, a deep generative framework based on variational auto-encoder (VAE), adept at extracting deep-level structural features from highly limited MILP data.
arXiv Detail & Related papers (2023-10-20T03:45:29Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Robust Millimeter Beamforming via Self-Supervised Hybrid Deep Learning [47.0425902438356]
We propose a robust beamforming self-supervised network, and verify it in two kinds of different datasets with various scenarios.
Simulation results show that the proposed self-supervised network with hybrid learning performs well in both classic DeepMIMO and new WAIR-D dataset.
Also, we present the principle to explain the rationality of this kind of hybrid learning, which is instructive to apply with more kinds of datasets.
arXiv Detail & Related papers (2023-03-09T05:30:53Z) - GraphLearner: Graph Node Clustering with Fully Learnable Augmentation [76.63963385662426]
Contrastive deep graph clustering (CDGC) leverages the power of contrastive learning to group nodes into different clusters.
We propose a Graph Node Clustering with Fully Learnable Augmentation, termed GraphLearner.
It introduces learnable augmentors to generate high-quality and task-specific augmented samples for CDGC.
arXiv Detail & Related papers (2022-12-07T10:19:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.