Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability
- URL: http://arxiv.org/abs/2603.01975v1
- Date: Mon, 02 Mar 2026 15:29:54 GMT
- Title: Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability
- Authors: Raquel Bosch-Romeu, Antonio Falcó, osé-Antonio Rodríguez-Gallego,
- Abstract summary: We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies. Given a labeled dataset encoded in a one-hot survey space, we assemble a frequency matrix whose columns aggregate feature occurrences within each class, and define a normalized Gram-type operator that satisfies the axioms of a density matrix. The resulting representation admits an intrinsic rank bound controlled by the number of classes, enabling low-dimensional spectral embeddings via dominant eigenmodes. Classification is performed in the reduced space through class-conditional kernel density estimation and a maximum-likelihood decision rule. We establish structural invariances, provide complexity estimates, and validate the approach on synthetic benchmarks probing high cardinality, sparsity, noise, and class imbalance.
Related papers
- Consistent spectral clustering in sparse tensor block models [0.0]
High-order clustering aims to classify objects in multiway datasets that are prevalent in various fields.<n>This paper introduces a tensor block model specifically designed for sparse integer-valued data tensors.<n>We propose a simple spectral clustering algorithm augmented with a trimming step to mitigate noise fluctuations.
arXiv Detail & Related papers (2025-01-23T16:41:19Z) - Robust spectral clustering with rank statistics [0.3823356975862007]
We consider eigenvector-based clustering applied to a matrix of nonparametric rank statistics that is derived entrywise from the raw, original data matrix.<n>Our main theoretical contributions are threefold and hold under flexible data generating conditions.<n>For a dataset of human connectomes, our approach yields parsimonious dimensionality reduction and improved recovery of ground-truth neuroanatomical cluster structure.
arXiv Detail & Related papers (2024-08-19T16:33:44Z) - Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - A simpler spectral approach for clustering in directed networks [1.52292571922932]
We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods.
We provide numerical evidence for the superiority of the Gaussian Mixture clustering over the widely used k-means algorithm.
arXiv Detail & Related papers (2021-02-05T14:16:45Z) - Learning Noise Transition Matrix from Only Noisy Labels via Total
Variation Regularization [88.91872713134342]
We propose a theoretically grounded method that can estimate the noise transition matrix and learn a classifier simultaneously.
We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets.
arXiv Detail & Related papers (2021-02-04T05:09:18Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.