Simplex Deep Linear Discriminant Analysis
- URL: http://arxiv.org/abs/2601.01679v1
- Date: Sun, 04 Jan 2026 22:22:59 GMT
- Title: Simplex Deep Linear Discriminant Analysis
- Authors: Maxat Tezekbayev, Arman Bolatov, Zhenisbek Assylbekov,
- Abstract summary: We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective.<n>While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of how to train the resulting deep classifier by maximum likelihood estimation (MLE)<n>We first show that end-to-end MLE training of an unconstrained Deep LDA model ignores discrimination: when both the LDA parameters and the encoder parameters are learned jointly, the likelihood admits a degenerate solution in which some of the class clusters may heavily overlap or even collapse, and classification performance deteriorates
- Score: 2.6381163133447836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of how to train the resulting deep classifier by maximum likelihood estimation (MLE). We first show that end-to-end MLE training of an unconstrained Deep LDA model ignores discrimination: when both the LDA parameters and the encoder parameters are learned jointly, the likelihood admits a degenerate solution in which some of the class clusters may heavily overlap or even collapse, and classification performance deteriorates. Batchwise moment re-estimation of the LDA parameters does not remove this failure mode. We then propose a constrained Deep LDA formulation that fixes the class means to the vertices of a regular simplex in the latent space and restricts the shared covariance to be spherical, leaving only the priors and a single variance parameter to be learned along with the encoder. Under these geometric constraints, MLE becomes stable and yields well-separated class clusters in the latent space. On images (Fashion-MNIST, CIFAR-10, CIFAR-100), the resulting Deep LDA models achieve accuracy competitive with softmax baselines while offering a simple, interpretable latent geometry that is clearly visible in two-dimensional projections.
Related papers
- Deep Linear Discriminant Analysis Revisited [3.569867801312133]
We show that for unconstrained Deep Linear Discriminant Analysis (LDA) classifiers, maximum-likelihood training admits pathological solutions.<n>We introduce the emphDiscriminative Negative Log-Likelihood (DNLL) loss, which augments the LDA log-likelihood with a simple penalty on the mixture density.
arXiv Detail & Related papers (2026-01-04T17:59:11Z) - Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe [61.68406997155879]
State-of-the-art Large Language Model (LLM) pruning methods operate layer-wise, minimizing the per-layer pruning error on a small dataset to avoid full retraining.<n>Existing methods hence rely on greedy convexs that ignore the weight interactions in the pruning objective.<n>Our method drastically reduces the per-layer pruning error, outperforms strong baselines on state-of-the-art GPT architectures, and remains memory-efficient.
arXiv Detail & Related papers (2025-10-15T16:13:44Z) - Linear Discriminant Analysis with Gradient Optimization on Covariance Inverse [4.872570541276082]
Linear discriminant analysis (LDA) is a fundamental method in statistical pattern recognition and classification.<n>In this work, we propose LDA with gradient optimization (LDA-GO), a new approach that directly optimize the inverse covariance matrix via gradient descent.<n>The algorithm parametrizes the inverse covariance matrix through Cholesky factorization, incorporates a low-rank extension to reduce computational complexity, and considers a multiple-initialization strategy.
arXiv Detail & Related papers (2025-06-07T15:50:43Z) - LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.<n>Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.<n>We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We propose a new family of algorithms that uses the linear minimization oracle (LMO) to adapt to the geometry of the problem.<n>We demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Minimally Informed Linear Discriminant Analysis: training an LDA model
with unlabelled data [51.673443581397954]
We show that it is possible to compute the exact projection vector from LDA models based on unlabelled data.
We show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA.
arXiv Detail & Related papers (2023-10-17T09:50:31Z) - Revisiting Classical Multiclass Linear Discriminant Analysis with a
Novel Prototype-based Interpretable Solution [0.0]
We introduce a novel solution to classical LDA, called LDA++, that yields $C$ features, each one interpretable as measuring similarity to one cluster.
This novel solution bridges between dimensionality reduction and multiclass classification.
arXiv Detail & Related papers (2022-05-02T06:12:42Z) - Regularized Deep Linear Discriminant Analysis [26.08062442399418]
As a non-linear extension of the classic Linear Discriminant Analysis(LDA), Deep Linear Discriminant Analysis(DLDA) replaces the original Categorical Cross Entropy(CCE) loss function.
Regularization method on within-class scatter matrix is proposed to strengthen the discriminative ability of each dimension.
arXiv Detail & Related papers (2021-05-15T03:54:32Z) - Self-Weighted Robust LDA for Multiclass Classification with Edge Classes [111.5515086563592]
A novel self-weighted robust LDA with l21-norm based between-class distance criterion, called SWRLDA, is proposed for multi-class classification.
The proposed SWRLDA is easy to implement, and converges fast in practice.
arXiv Detail & Related papers (2020-09-24T12:32:55Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.