Realistic Synthetic Household Data Generation at Scale
- URL: http://arxiv.org/abs/2602.07243v1
- Date: Fri, 06 Feb 2026 22:49:37 GMT
- Title: Realistic Synthetic Household Data Generation at Scale
- Authors: Siddharth Singh, Ifrah Idrees, Abraham Dauhajre,
- Abstract summary: Embodied AI can be used to develop interactive agents capable of environmental reasoning and interaction.<n>Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions.<n>These contributions enable development and testing of household smart devices at scale.
- Score: 2.809651739704387
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.
Related papers
- FLOW: A Feedback-Driven Synthetic Longitudinal Dataset of Work and Wellbeing [0.0]
FLOW is a synthetic longitudinal dataset designed to model daily interactions between workload, lifestyle behaviors, and wellbeing.<n>FLOW simulates 1,000 individuals over a two-year period with daily resolution and is released as a publicly available resource.
arXiv Detail & Related papers (2025-12-28T14:54:04Z) - Learning Human-Object Interaction as Groups [52.28258599873394]
GroupHOI is a framework that propagates contextual information in terms of geometric proximity and semantic similarity.<n>It exhibits leading performance on the more challenging Nonverbal Interaction Detection task.
arXiv Detail & Related papers (2025-10-21T07:25:10Z) - InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation [54.09384502044162]
We introduce InterAct, a large-scale 3D HOI benchmark featuring dataset and methodological advancements.<n>First, we consolidate and standardize 21.81 hours of HOI data from diverse sources, enriching it with detailed textual annotations.<n>Second, we propose a unified optimization framework to enhance data quality by reducing artifacts and correcting hand motions.<n>Third, we define six benchmarking tasks and develop a unified HOI generative modeling perspective, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-09-11T15:43:54Z) - Personalized Counterfactual Framework: Generating Potential Outcomes from Wearable Data [1.7396556690675233]
This paper introduces a framework to learn personalized counterfactual models from wearable data.<n>We first augment individual datasets with data from similar patients via multi-modal similarity analysis.<n>We then use a temporal PC (Peter-Clark) algorithm adaptation to discover predictive relationships.<n> Gradient Boosting Machines are trained on these relationships to quantify individual-specific effects.
arXiv Detail & Related papers (2025-08-20T05:04:17Z) - A Framework for Realistic Simulation of Daily Human Activity [1.8877825068318652]
This paper presents a framework for simulating daily human activity patterns in home environments at scale.
We introduce a method for specifying day-to-day variation in schedules and present a bidirectional constraint propagation algorithm for generating schedules from templates.
arXiv Detail & Related papers (2023-11-26T19:50:23Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - HARPS: An Online POMDP Framework for Human-Assisted Robotic Planning and
Sensing [1.3678064890824186]
The Human Assisted Robotic Planning and Sensing (HARPS) framework is presented for active semantic sensing and planning in human-robot teams.
This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments.
Simulations of a UAV-enabled target search application in a large-scale partially structured environment show significant improvements in time and belief state estimates.
arXiv Detail & Related papers (2021-10-20T00:41:57Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Jointly Predicting Job Performance, Personality, Cognitive Ability,
Affect, and Well-Being [42.67003631848889]
We create a benchmark for predictive analysis of individuals from a perspective that integrates physical and physiological behavior, psychological states and traits, and job performance.
We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests.
arXiv Detail & Related papers (2020-06-10T14:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.