Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features
- URL: http://arxiv.org/abs/2505.22997v2
- Date: Fri, 06 Jun 2025 21:44:27 GMT
- Title: Theoretical Foundations of the Deep Copula Classifier: A Generative Approach to Modeling Dependent Features
- Authors: Agnideep Aich, Ashit Baran Aich, Bruce Wade,
- Abstract summary: Deep Copula (DCC) is a generative model that separates the learning of each feature's marginal distribution from modeling their joint dependence structure.<n> lightweight neural networks are used to flexibly and adaptively capture feature interactions.<n>DCC offers a mathematically grounded and interpretable framework for dependency-aware classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional classifiers often assume feature independence or rely on overly simplistic relationships, leading to poor performance in settings where real-world dependencies matter. We introduce the Deep Copula Classifier (DCC), a generative model that separates the learning of each feature's marginal distribution from the modeling of their joint dependence structure via neural network-parameterized copulas. For each class, lightweight neural networks are used to flexibly and adaptively capture feature interactions, making DCC particularly effective when classification is driven by complex dependencies. We establish that DCC converges to the Bayes-optimal classifier under standard conditions and provide explicit convergence rates of O(n^{-r/(2r + d)}) for r-smooth copula densities. Beyond theoretical guarantees, we outline several practical extensions, including high-dimensional scalability through vine and factor copula architectures, semi-supervised learning via entropy regularization, and online adaptation using streaming gradient methods. By unifying statistical rigor with the representational power of neural networks, DCC offers a mathematically grounded and interpretable framework for dependency-aware classification.
Related papers
- High-Dimensional Robust Mean Estimation with Untrusted Batches [38.14592862692954]
We study high-dimensional mean estimation in a collaborative setting where data is contributed by $N$ users in batches of size $n$.<n>We formalize this challenge through a double corruption landscape: an $varepsilon$-fraction of users are entirely adversarial, while the remaining good'' users provide data from distributions that are related to $P$, but deviate by a proximity parameter $$.<n>Our algorithms achieve the minimax-optimal error rate $O(sqrtvarepsilon/n + sqrtd/nN + s
arXiv Detail & Related papers (2026-02-24T08:59:37Z) - Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z) - Lattice-Based Pruning in Recurrent Neural Networks via Poset Modeling [0.0]
Recurrent neural networks (RNNs) are central to sequence modeling tasks, yet their high computational complexity poses challenges for scalability and real-time deployment.<n>We introduce a novel framework that models RNNs as partially ordered sets (posets) and constructs corresponding dependency lattices.<n>By identifying meet irreducible neurons, our lattice-based pruning algorithm selectively retains critical connections while eliminating redundant ones.
arXiv Detail & Related papers (2025-02-23T10:11:38Z) - Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling.<n>Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification.<n>No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z) - Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning [12.883347524020724]
Decentralized learning strategies allow a collection of agents to learn efficiently from local data sets without the need for central aggregation or orchestration.<n>We propose a new decentralized learning algorithm, termed DRT diffusion, based on deep relative trust (DRT), a recently introduced similarity measure for neural networks.
arXiv Detail & Related papers (2025-01-06T17:31:36Z) - Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z) - Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts [8.700087812420687]
We provide guarantees for the model's performance on a different, unseen network "B"<n>We show how the principled vanilla DKW bound enables certification of the model's true performance on unseen clients within the same (source) network.
arXiv Detail & Related papers (2024-10-26T18:45:15Z) - Geometric sparsification in recurrent neural networks [0.8851237804522972]
A common technique for ameliorating the computational costs of running large neural models is sparsification.<n>We propose a new technique for sparsification of recurrent neural nets (RNNs) called moduli regularization.<n>We show that moduli regularization induces more stable recurrent neural nets, and achieves high fidelity models above 90% sparsity.
arXiv Detail & Related papers (2024-06-10T14:12:33Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.<n>We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling [4.190836962132713]
This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms.
At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its conditioned kernel on input sequence.
We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality.
arXiv Detail & Related papers (2024-02-28T17:36:45Z) - Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption [60.958746600254884]
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL)
We introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE.
We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE)
arXiv Detail & Related papers (2024-02-14T07:27:30Z) - Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties.
We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z) - Developing and Improving Risk Models using Machine-learning Based
Algorithms [6.245537312562826]
The objective of this study is to develop a good risk model for classifying business delinquency.
The rationale under the analyses is firstly to obtain good base binary classifiers via regularization.
Two model ensembling algorithms including bagging and boosting are performed on the good base classifiers for further model improvement.
arXiv Detail & Related papers (2020-09-09T20:38:00Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Online Robust Regression via SGD on the l1 loss [19.087335681007477]
We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner.
We show in this work that the descent on the $ell_O( 1 / (1 - eta)2 n )$ loss converges to the true parameter vector at a $tildeO( 1 / (1 - eta)2 n )$ rate which is independent of the values of the contaminated measurements.
arXiv Detail & Related papers (2020-07-01T11:38:21Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.