Related papers: Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

URL: http://arxiv.org/abs/2309.09818v1
Date: Mon, 18 Sep 2023 14:39:26 GMT
Title: Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
Authors: An Dinh Vuong, Minh Nhat Vu, Hieu Le, Baoru Huang, Binh Huynh, Thieu Vo, Andreas Kugi, Anh Nguyen
Abstract summary: Foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. We show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments.
Score: 15.17542697393971
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.

Related papers

SatMamba: Development of Foundation Models for Remote Sensing Imagery Using State Space Models [0.0]
Foundation models refer to deep learning models pretrained on large unlabeled datasets through self-supervised algorithms. Various foundation models have been developed for remote sensing, such as those for multispectral, high-resolution, and hyperspectral images. This research proposes SatMamba, a new pretraining framework that combines masked autoencoders with State Space Model.
arXiv Detail & Related papers (2025-02-01T14:07:21Z)
Adapting a Foundation Model for Space-based Tasks [16.81793096235458]
In the future of space robotics, we see three core challenges which motivate the use of a foundation model adapted to space-based applications. In this work, we demonstrate that 1) existing vision-language models are deficient visual reasoners in space-based applications, and 2) fine-tuning a vision-language model on extraterrestrial data significantly improves the quality of responses.
arXiv Detail & Related papers (2024-08-12T05:07:24Z)
Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data. It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z)
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset [75.9621305227523]
We introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art large language models (LLMs) This dataset is collected from 210K IP addresses in the wild on our Vicuna demo and Arena website. We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions.
arXiv Detail & Related papers (2023-09-21T12:13:55Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis. The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper. We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z)
A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets [0.0]
Estimating human posture has recently gained increasing attention in the computer vision community. We present a model that can predict the skeleton of an animation based solely on 2D images. The implementation process uses DeepLabCut on its own dataset to perform many necessary steps.
arXiv Detail & Related papers (2022-01-07T15:42:50Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps. Our dataset is collected in both forms of 2D images and 3D point clouds. Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.