Related papers: Statistical modeling of categorical trajectories with multivariate functional principal components

Statistical modeling of categorical trajectories with multivariate functional principal components

URL: http://arxiv.org/abs/2502.09986v2
Date: Thu, 20 Mar 2025 07:41:18 GMT
Title: Statistical modeling of categorical trajectories with multivariate functional principal components
Authors: Hervé Cardot, Caroline Peltier,
Abstract summary: We turn the problem of statistical modeling of a categorical process into a functional data analysis issue.<n>We show how it can be easily extended to analyze experiments, such as temporal check-all-that-apply.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There are many examples in which the statistical units of interest are samples of a continuous time categorical random process, that is to say a continuous time stochastic process taking values in a finite state space. Without loosing any information, we associate to each state a binary random function, taking values in $\{0,1\}$, and turn the problem of statistical modeling of a categorical process into a multivariate functional data analysis issue. The (multivariate) covariance operator has nice interpretations in terms of departure from independence of the joint probabilities and the multivariate functional principal components are simple to interpret. Under the weak hypothesis assuming only continuity in probability of the $0-1$ trajectories, it is simple to build consistent estimators of the covariance kernel and perform multivariate functional principal components analysis. The sample paths being piecewise constant, with a finite number of jumps, this a rare case in functional data analysis in which the trajectories are not supposed to be continuous and can be observed exhaustively. The approach is illustrated on a data set of sensory perceptions, considering different gustometer-controlled stimuli experiments. We also show how it can be easily extended to analyze experiments, such as temporal check-all-that-apply, in which two states or more can be observed at the same time.

Related papers

Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z)
Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies [7.899668963928307]
Cross-balancing is a method that uses sample splitting to separate error in feature construction from the error in weight estimation.<n>We show that careful use of outcome information can substantially improve both estimation and inference while maintaining interpretability.
arXiv Detail & Related papers (2025-11-19T21:56:22Z)
Free Probability approach to spectral and operator statistics in Rosenzweig-Porter random matrix ensembles [0.0]
We analyze the spectral and operator statistics of the Rosenzweig-Porter random matrix ensembles.<n>We develop a perturbative scheme that yields semi-analytic expressions for the density of states up to second order in system size.
arXiv Detail & Related papers (2025-06-04T23:56:23Z)
Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z)
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points. Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z)
High-dimensional logistic regression with missing data: Imputation, regularization, and universality [7.167672851569787]
We study high-dimensional, ridge-regularized logistic regression. We provide exact characterizations of both the prediction error and the estimation error.
arXiv Detail & Related papers (2024-10-01T21:41:21Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Logistic-beta processes for dependent random probabilities with beta marginals [58.91121576998588]
We propose a novel process called the logistic-beta process, whose logistic transformation yields a process with common beta marginals. It can model dependence on both discrete and continuous domains, such as space or time, and has a flexible dependence structure through correlation kernels. We illustrate the benefits through nonparametric binary regression and conditional density estimation examples, both in simulation studies and in a pregnancy outcome application.
arXiv Detail & Related papers (2024-02-10T21:41:32Z)
Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach [10.27974860479791]
In many applications where the data exhibit temporal dependencies, the corresponding empirical processes are much less understood. We present a general bound on the expected supremum of empirical processes under standard $beta/rho$-mixing assumptions. We show that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting.
arXiv Detail & Related papers (2024-01-17T05:08:37Z)
Regular Variation in Hilbert Spaces and Principal Component Analysis for Functional Extremes [1.6734018640023431]
We place ourselves in a Peaks-Over-Threshold framework where a functional extreme is defined as an observation $X$ whose $L2$-norm $|X|$ is comparatively large. Our goal is to propose a dimension reduction framework resulting into finite dimensional projections for such extreme observations.
arXiv Detail & Related papers (2023-08-02T09:12:03Z)
Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies. We show that the likelihood of the available data has no local maxima. We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes. No nonparametric test of conditional local independence has been available. We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z)
Binary Independent Component Analysis via Non-stationarity [7.283533791778359]
We consider independent component analysis of binary data. We start by assuming a linear mixing model in a continuous-valued latent space, followed by a binary observation model. In stark contrast to the continuous-valued case, we prove non-identifiability of the model with few observed variables.
arXiv Detail & Related papers (2021-11-30T14:23:53Z)
Finite-Time Convergence Rates of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning [16.09467599829253]
We study a data-driven approach for finding the root of an operator under noisy measurements. A network of agents, each with its own operator and data observations, cooperatively find the fixed point of the aggregate operator over a decentralized communication graph. Our main contribution is to provide a finite-time analysis of this decentralized approximation method when the data observed at each agent are sampled from a Markov process.
arXiv Detail & Related papers (2020-10-28T17:01:54Z)
Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction. We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction. Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.