EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions
- URL: http://arxiv.org/abs/2505.11417v1
- Date: Fri, 16 May 2025 16:29:21 GMT
- Title: EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions
- Authors: Patryk Bartkowiak, Michal Podstawski,
- Abstract summary: This paper introduces a novel dataset designed to assess and improve small language models deployable on edge devices.<n>At the core of the dataset are structured user profiles, each defined by a set of routines.<n>A large language model (LLM) generates corresponding interaction sessions that simulate realistic, diverse, and context-aware dialogues.
- Score: 0.6650227510403052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a novel dataset and evaluation benchmark designed to assess and improve small language models deployable on edge devices, with a focus on user profiling from multi-session natural language interactions in smart home environments. At the core of the dataset are structured user profiles, each defined by a set of routines - context-triggered, repeatable patterns of behavior that govern how users interact with their home systems. Using these profiles as input, a large language model (LLM) generates corresponding interaction sessions that simulate realistic, diverse, and context-aware dialogues between users and their devices. The primary task supported by this dataset is profile reconstruction: inferring user routines and preferences solely from interactions history. To assess how well current models can perform this task under realistic conditions, we benchmarked several state-of-the-art compact language models and compared their performance against large foundation models. Our results show that while small models demonstrate some capability in reconstructing profiles, they still fall significantly short of large models in accurately capturing user behavior. This performance gap poses a major challenge - particularly because on-device processing offers critical advantages, such as preserving user privacy, minimizing latency, and enabling personalized experiences without reliance on the cloud. By providing a realistic, structured testbed for developing and evaluating behavioral modeling under these constraints, our dataset represents a key step toward enabling intelligent, privacy-respecting AI systems that learn and adapt directly on user-owned devices.
Related papers
- ASMR: Augmenting Life Scenario using Large Generative Models for Robotic Action Reflection [21.75681306780917]
This paper introduces a novel framework focusing on data augmentation in robotic assistance scenarios.<n>It involves leveraging a sophisticated large language model to simulate potential conversations and environmental contexts.<n>The additionally generated data serves to refine the latest multimodal models, enabling them to more accurately determine appropriate actions.
arXiv Detail & Related papers (2025-06-16T19:58:54Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles [37.43150003866563]
User simulators are crucial for replicating human interactions with dialogue systems.<n>We propose User Simulator with implicit Profiles (USP), a framework that infers implicit user profiles from human-machine conversations.<n>USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency.
arXiv Detail & Related papers (2025-02-26T09:26:54Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.<n>We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following [27.22804560751958]
We propose a collaborative generation framework integrating large (hosted on cloud infrastructure) and small models (deployed on local devices) to address privacy concerns logically.
Our experimental findings, based on our synthesized dataset and two additional open-source datasets, indicate that: 1) Large-scale models perform well when provided with user context but struggle in the absence of such context.
Our framework, utilizing mixed-scale models, showcases competitive performance, providing a feasible solution to privacy issues.
arXiv Detail & Related papers (2024-03-05T17:15:28Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Federated Privacy-preserving Collaborative Filtering for On-Device Next
App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage.
We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning.
One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z) - On-device modeling of user's social context and familiar places from
smartphone-embedded sensor data [7.310043452300736]
We propose a novel, unsupervised and lightweight approach to model the user's social context and her locations.
We exploit data related to both physical and cyber social interactions among users and their devices.
We show the performance of 3 machine learning algorithms to recognize daily-life situations.
arXiv Detail & Related papers (2022-05-18T08:32:26Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.