Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation
- URL: http://arxiv.org/abs/2510.08807v1
- Date: Thu, 09 Oct 2025 20:43:27 GMT
- Title: Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation
- Authors: Zhenyu Zhao, Hongyi Jing, Xiawei Liu, Jiageng Mao, Abha Jha, Hanwen Yang, Rong Xue, Sergey Zakharor, Vitor Guizilini, Yue Wang,
- Abstract summary: Humanoid Everyday is a large-scale and diverse humanoid manipulation dataset.<n>It aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations.<n>We conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations.
- Score: 16.701354625940308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.
Related papers
- Emergence of Human to Robot Transfer in Vision-Language-Action Models [88.76648919814771]
Vision-language-action (VLA) models can enable broad open world generalization, but require large and diverse datasets.<n>We show that human-to-robot transfer emerges once the VLA is pre-trained on sufficient scenes, tasks, and embodiments.
arXiv Detail & Related papers (2025-12-27T00:13:11Z) - Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations [52.29884993824894]
Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community.<n>AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any environment using Aria Gen 2 glasses.
arXiv Detail & Related papers (2025-11-20T18:59:02Z) - EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data [18.635934496561365]
EgoBridge aims to align the policy latent spaces between human and robot data using domain adaptation.<n>It achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines.
arXiv Detail & Related papers (2025-09-23T22:34:47Z) - Humanoid Policy ~ Human Policy [41.34186233320398]
We train a human-humanoid behavior policy, which we term Human Action Transformer (HAT)<n>The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions.<n>We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency.
arXiv Detail & Related papers (2025-03-17T17:59:09Z) - Learning from Massive Human Videos for Universal Humanoid Pose Control [46.417054298537195]
This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions.<n>We train a large humanoid model, UH-1, which takes text instructions as input and outputs corresponding actions to control a humanoid robot.<n>Our scalable training approach leads to superior generalization in text-based humanoid control, marking a significant step toward adaptable, real-world-ready humanoid robots.
arXiv Detail & Related papers (2024-12-18T18:59:56Z) - RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation [47.41571121843972]
We introduce RoboMIND, a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes.<n>RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information.<n>Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction.
arXiv Detail & Related papers (2024-12-18T14:17:16Z) - HumanPlus: Humanoid Shadowing and Imitation from Humans [82.47551890765202]
We introduce a full-stack system for humanoids to learn motion and autonomous skills from human data.
We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets.
We then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously.
arXiv Detail & Related papers (2024-06-15T00:41:34Z) - HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal
Human Demonstrations [51.87067543670535]
We propose a robot-learning system that can take a small number of human demonstrations and learn to grasp unseen object poses.
We train a dexterous grasping policy that takes the point clouds of the object as input and predicts continuous actions to grasp objects from different initial robot states.
The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world.
arXiv Detail & Related papers (2022-09-28T17:51:49Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.