Probabilistic Digital Twins of Users: Latent Representation Learning with Statistically Validated Semantics
- URL: http://arxiv.org/abs/2512.18056v1
- Date: Fri, 19 Dec 2025 20:49:51 GMT
- Title: Probabilistic Digital Twins of Users: Latent Representation Learning with Statistically Validated Semantics
- Authors: Daniel David,
- Abstract summary: Understanding user identity and behavior is central to applications such as personalization, recommendation, and decision support.<n>We propose a probabilistic digital twin framework in which each user is modeled as a latent state that generates observed behavioral data.<n>We instantiate this framework using a variational autoencoder (VAE) applied to a user-response dataset designed to capture stable aspects of user identity.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Understanding user identity and behavior is central to applications such as personalization, recommendation, and decision support. Most existing approaches rely on deterministic embeddings or black-box predictive models, offering limited uncertainty quantification and little insight into what latent representations encode. We propose a probabilistic digital twin framework in which each user is modeled as a latent stochastic state that generates observed behavioral data. The digital twin is learned via amortized variational inference, enabling scalable posterior estimation while retaining a fully probabilistic interpretation. We instantiate this framework using a variational autoencoder (VAE) applied to a user-response dataset designed to capture stable aspects of user identity. Beyond standard reconstruction-based evaluation, we introduce a statistically grounded interpretation pipeline that links latent dimensions to observable behavioral patterns. By analyzing users at the extremes of each latent dimension and validating differences using nonparametric hypothesis tests and effect sizes, we demonstrate that specific dimensions correspond to interpretable traits such as opinion strength and decisiveness. Empirically, we find that user structure is predominantly continuous rather than discretely clustered, with weak but meaningful structure emerging along a small number of dominant latent axes. These results suggest that probabilistic digital twins can provide interpretable, uncertainty-aware representations that go beyond deterministic user embeddings.
Related papers
- Explaining Machine Learning Predictive Models through Conditional Expectation Methods [0.0]
MUCE is a model-agnostic method for local explainability designed to capture prediction changes from feature interactions.<n>Two quantitative indices, stability and uncertainty, summarize local behavior and assess model reliability.<n>Results show that MUCE effectively captures complex local model behavior, while the stability and uncertainty indices provide meaningful insight into prediction confidence.
arXiv Detail & Related papers (2026-01-12T08:34:36Z) - Individualised Counterfactual Examples Using Conformal Prediction Intervals [12.895240620484572]
High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision.<n>We explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval.<n>We present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary.<n>Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query.
arXiv Detail & Related papers (2025-05-28T13:13:52Z) - The diameter of a stochastic matrix: A new measure for sensitivity analysis in Bayesian networks [1.2699007098398807]
We argue that robustness methods based on the familiar total variation distance provide simple and more valuable bounds on robustness to misspecification.
We introduce a novel measure of dependence in conditional probability tables called the diameter to derive such bounds.
arXiv Detail & Related papers (2024-07-05T17:22:12Z) - I Bet You Did Not Mean That: Testing Semantic Importance via Betting [8.909843275476264]
We formalize the global (i.e., over a population) and local (i.e. for a sample) statistical importance of semantic concepts for the predictions of opaque models by means of conditional independence.
We use recent ideas of sequential kernelized independence testing to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework.
arXiv Detail & Related papers (2024-05-29T14:51:41Z) - Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals.
We treat the perturbations as random variables endowed with prior distribution functions.
A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Bayesian Networks for the robust and unbiased prediction of depression
and its symptoms utilizing speech and multimodal data [65.28160163774274]
We apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.
arXiv Detail & Related papers (2022-11-09T14:48:13Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Where and What? Examining Interpretable Disentangled Representations [96.32813624341833]
Capturing interpretable variations has long been one of the goals in disentanglement learning.
Unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting.
In this paper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted.
arXiv Detail & Related papers (2021-04-07T11:22:02Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.