Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes
- URL: http://arxiv.org/abs/2501.19298v1
- Date: Fri, 31 Jan 2025 16:55:43 GMT
- Title: Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes
- Authors: Zhiyao Xu, Dan Zhao, Qingsong Zou, Jingyu Xiao, Yong Jiang, Zhenhui Yuan, Qing Li,
- Abstract summary: In recent years, as smart home systems have become more widespread, security concerns within these environments have become a growing threat.<n>Currently, most smart home security solutions, such as anomaly detection and behavior prediction models, are trained using fixed datasets that are precollected.<n>We propose an LLM-based synthetic dataset generation IoTGen framework to enhance the generalization of downstream smart home intelligent models.
- Score: 28.992302869611105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, as smart home systems have become more widespread, security concerns within these environments have become a growing threat. Currently, most smart home security solutions, such as anomaly detection and behavior prediction models, are trained using fixed datasets that are precollected. However, the process of dataset collection is time-consuming and lacks the flexibility needed to adapt to the constantly evolving smart home environment. Additionally, the collection of personal data raises significant privacy concerns for users. Lately, large language models (LLMs) have emerged as a powerful tool for a wide range of tasks across diverse application domains, thanks to their strong capabilities in natural language processing, reasoning, and problem-solving. In this paper, we propose an LLM-based synthetic dataset generation IoTGen framework to enhance the generalization of downstream smart home intelligent models. By generating new synthetic datasets that reflect changes in the environment, smart home intelligent models can be retrained to overcome the limitations of fixed and outdated data, allowing them to better align with the dynamic nature of real-world home environments. Specifically, we first propose a Structure Pattern Perception Compression (SPPC) method tailored for IoT behavior data, which preserves the most informative content in the data while significantly reducing token consumption. Then, we propose a systematic approach to create prompts and implement data generation to automatically generate IoT synthetic data with normative and reasonable properties, assisting task models in adaptive training to improve generalization and real-world performance.
Related papers
- Large language model as user daily behavior data generator: balancing population diversity and individual personality [12.464365435176099]
We introduce BehaviorGen, a framework that uses large language models to generate high-quality synthetic behavior data.<n>By simulating user behavior based on profiles and real events, BehaviorGen supports data augmentation and replacement in behavior prediction models.<n>We evaluate its performance in scenarios such as augmentation, fine-tuning replacement, and fine-tuning augmentation, achieving significant improvements in human mobility and smartphone usage predictions.
arXiv Detail & Related papers (2025-05-23T08:22:09Z) - ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution [77.86222359025011]
We propose ToolACE-DEV, a self-improving framework for tool learning.<n>First, we decompose the tool-learning objective into sub-tasks that enhance basic tool-making and tool-using abilities.<n>We then introduce a self-evolving paradigm that allows lightweight models to self-improve, reducing reliance on advanced LLMs.
arXiv Detail & Related papers (2025-05-12T12:48:30Z) - Open-Source LLM-Driven Federated Transformer for Predictive IoV Management [1.8024397171920885]
Federated Prompt-d Traffic Transformer (FPoTT) is a novel framework that leverages open-source Large Language Models for predictive IoV management.
FPoTT introduces a dynamic prompt optimization mechanism that iteratively refines textual prompts to enhance trajectory prediction.
The architecture employs a dual-layer federated learning paradigm, combining lightweight edge models for real-time inference with cloud-based LLMs to retain global intelligence.
arXiv Detail & Related papers (2025-05-01T16:54:21Z) - FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors [50.131271229165165]
Federated Learning (FL) has emerged as a promising framework for distributed machine learning.
Data heterogeneity resulting from differences across user behaviors, preferences, and device characteristics poses a significant challenge for federated learning.
We propose Adaptive Weight Aggregation (FedAWA), a novel method that adaptively adjusts aggregation weights based on client vectors during the learning process.
arXiv Detail & Related papers (2025-03-20T04:49:40Z) - AugGen: Synthetic Augmentation Can Improve Discriminative Models [14.680260279598045]
We introduce a novel self-contained synthetic augmentation technique.
It strategically samples from a conditional generative model trained exclusively on the target dataset.
It achieves 1--12% performance improvements on the IJB-C and IJB-B benchmarks.
arXiv Detail & Related papers (2025-03-14T16:10:21Z) - Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.<n>Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z) - SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction [37.461695201579914]
We propose SmartPretrain, a general and scalable framework for motion prediction.
Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms.
SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics.
arXiv Detail & Related papers (2024-10-11T09:52:26Z) - Few-Shot Class-Incremental Learning with Non-IID Decentralized Data [12.472285188772544]
Few-shot class-incremental learning is crucial for developing scalable and adaptive intelligent systems.
This paper introduces federated few-shot class-incremental learning, a decentralized machine learning paradigm.
We present a synthetic data-driven framework that leverages replay buffer data to maintain existing knowledge and facilitate the acquisition of new knowledge.
arXiv Detail & Related papers (2024-09-18T02:48:36Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.
A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.
A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.
An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data [10.217822818544475]
We propose a framework to generate synthetic (tabular) data powered by large language models (LLMs)
Our approach significantly enhance the quality of synthetic data generation in common scenarios with small sample sizes.
Our results demonstrate that our model outperforms several state-of-art models regarding generating higher quality synthetic data for downstream tasks while keeping privacy of the real data.
arXiv Detail & Related papers (2024-06-15T06:26:17Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.