Differentiable Entropy Regularization for Geometry and Neural Networks
- URL: http://arxiv.org/abs/2509.03733v1
- Date: Wed, 03 Sep 2025 21:38:22 GMT
- Title: Differentiable Entropy Regularization for Geometry and Neural Networks
- Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma,
- Abstract summary: We introduce a differentiable estimator of range-partition entropy, a recent concept from computational geometry.<n>We design EntropyNet, a neural module that restructures data into low-entropy forms to accelerate downstream instance-optimal algorithms.<n>Across tasks, we demonstrate that differentiable entropy improves efficiency without degrading correctness.
- Score: 6.908972852063454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a differentiable estimator of range-partition entropy, a recent concept from computational geometry that enables algorithms to adapt to the "sortedness" of their input. While range-partition entropy provides strong guarantees in algorithm design, it has not yet been made accessible to deep learning. In this work, we (i) propose the first differentiable approximation of range-partition entropy, enabling its use as a trainable loss or regularizer; (ii) design EntropyNet, a neural module that restructures data into low-entropy forms to accelerate downstream instance-optimal algorithms; and (iii) extend this principle beyond geometry by applying entropy regularization directly to Transformer attention. Across tasks, we demonstrate that differentiable entropy improves efficiency without degrading correctness: in geometry, our method achieves up to $4.1\times$ runtime speedups with negligible error ($<0.2%$); in deep learning, it induces structured attention patterns that yield 6% higher accuracy at 80% sparsity compared to L1 baselines. Our theoretical analysis provides approximation bounds for the estimator, and extensive ablations validate design choices. These results suggest that entropy-bounded computation is not only theoretically elegant but also a practical mechanism for adaptive learning, efficiency, and structured representation.
Related papers
- Learning to Approximate Uniform Facility Location via Graph Neural Networks [45.627700504265086]
We develop a fully differentiable MPNN model that embeds approximation-algorithmic principles.<n>We show that our approach outperforms standard non-learned approximation algorithms in terms of solution quality.
arXiv Detail & Related papers (2026-02-13T18:08:23Z) - Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z) - Accelerated training of Gaussian processes using banded square exponential covariances [0.0]
We propose a novel approach to computationally efficient GP training based on the observation that square-exponential (SE) covariance matrices contain several off-diagonal entries extremely close to zero.<n>We construct a principled procedure to eliminate those entries to produce a emphbanded-matrix approximation to the original covariance, whose inverse and determinant can be computed at a reduced computational cost.
arXiv Detail & Related papers (2026-01-26T22:35:20Z) - Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation [55.88862563823878]
In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics.<n>We demonstrate performance of the algorithm on two PDEs: a linear equation which governs slightly compressible fluid flow in porous media and the wave equation.<n>Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest.
arXiv Detail & Related papers (2025-07-24T11:02:13Z) - Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
We show that our algorithm can identify the globally optimal reward and policy under certain neural network structures.<n>This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality.
arXiv Detail & Related papers (2025-03-22T21:16:08Z) - Efficient Methods for Non-stationary Online Learning [61.63338724659592]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.<n>We also study an even strengthened measure, namely the interval dynamic regret'', and reduce the number of projections per round from $mathcalO(log2 T)$ to $1$.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Structured Optimal Variational Inference for Dynamic Latent Space Models [16.531262817315696]
We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions.
To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework.
arXiv Detail & Related papers (2022-09-29T22:10:42Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Particle Dual Averaging: Optimization of Mean Field Neural Networks with
Global Convergence Rate Analysis [40.762447301225926]
We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization.
An important application of the proposed method is the optimization of two-layer neural network in the mean field regime.
We show that neural networks in the mean field limit can be globally optimized by PDA.
arXiv Detail & Related papers (2020-12-31T07:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.