Related papers: Tree-Preconditioned Differentiable Optimization and Axioms as Layers

Tree-Preconditioned Differentiable Optimization and Axioms as Layers

URL: http://arxiv.org/abs/2601.06036v1
Date: Wed, 03 Dec 2025 04:47:37 GMT
Title: Tree-Preconditioned Differentiable Optimization and Axioms as Layers
Authors: Yuexin Liao,
Abstract summary: "Axioms-as-Layers" paradigm embeds axiomatic structure of Random Utility Models directly into deep neural networks.<n>"Axioms-as-Layers" paradigm eliminates the structural overfitting inherent in penalty-based methods.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a differentiable framework that embeds the axiomatic structure of Random Utility Models (RUM) directly into deep neural networks. Although projecting empirical choice data onto the RUM polytope is NP-hard in general, we uncover an isomorphism between RUM consistency and flow conservation on the Boolean lattice. Leveraging this combinatorial structure, we derive a novel Tree-Preconditioned Conjugate Gradient solver. By exploiting the spanning tree of the constraint graph, our preconditioner effectively "whitens" the ill-conditioned Hessian spectrum induced by the Interior Point Method barrier, achieving superlinear convergence and scaling to problem sizes previously deemed unsolvable. We further formulate the projection as a differentiable layer via the Implicit Function Theorem, where the exact Jacobian propagates geometric constraints during backpropagation. Empirical results demonstrate that this "Axioms-as-Layers" paradigm eliminates the structural overfitting inherent in penalty-based methods, enabling models that are jointly trainable, provably rational, and capable of generalizing from sparse data regimes where standard approximations fail.

Related papers

Structure-Aware Robust Counterfactual Explanations via Conditional Gaussian Network Classifiers [0.26999000177990923]
This work presents a structure-aware robustness-and-counterfactual search method based on conditional conditional graphs.<n>Results show that our method achieves strong consistency, with direct optimization of the original formulation providing especially stable dependencies.<n>The proposed framework lays the groundwork for future advances in counterfactual reasoning under noncyclic constraints.
arXiv Detail & Related papers (2026-02-08T15:51:45Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Inference of maximum parsimony phylogenetic trees with model-based classical and quantum methods [10.13082983732946]
We design three optimization models compatible with both classical and quantum solvers.<n>Our method directly searches the complete solution space of all possible tree topologies and ancestral states.<n>Our quantum simulations successfully find the exact optimal solutions for small-scale instances.
arXiv Detail & Related papers (2025-08-01T09:43:12Z)
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias [55.72269695392027]
This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
arXiv Detail & Related papers (2025-05-05T12:33:18Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
EM's Convergence in Gaussian Latent Tree Models [22.987933817370305]
We show that the unique non-trivial point of the population log-likelihood is its global maximum. We establish that the expectation-maximization algorithm is guaranteed to converge to it in the single latent variable case.
arXiv Detail & Related papers (2022-11-21T23:12:58Z)
A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z)
Partial Counterfactual Identification from Observational and Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data. Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z)
Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI) Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.