Related papers: Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations

Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations

URL: http://arxiv.org/abs/2505.02537v2
Date: Tue, 06 May 2025 11:45:55 GMT
Title: Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations
Authors: Davide Sartor, Alberto Sinigaglia, Gian Antonio Susto,
Abstract summary: We show that convex monotone activations and non-positive constrained weights qualify as universal approximators.<n>We propose an alternative formulation that allows the network to adjust its activations according to the sign of the weights.
Score: 4.659033572014701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conventional techniques for imposing monotonicity in MLPs by construction involve the use of non-negative weight constraints and bounded activation functions, which pose well-known optimization challenges. In this work, we generalize previous theoretical results, showing that MLPs with non-negative weight constraint and activations that saturate on alternating sides are universal approximators for monotonic functions. Additionally, we show an equivalence between the saturation side in the activations and the sign of the weight constraint. This connection allows us to prove that MLPs with convex monotone activations and non-positive constrained weights also qualify as universal approximators, in contrast to their non-negative constrained counterparts. Our results provide theoretical grounding to the empirical effectiveness observed in previous works while leading to possible architectural simplification. Moreover, to further alleviate the optimization difficulties, we propose an alternative formulation that allows the network to adjust its activations according to the sign of the weights. This eliminates the requirement for weight reparameterization, easing initialization and improving training stability. Experimental evaluation reinforces the validity of the theoretical results, showing that our novel approach compares favourably to traditional monotonic architectures.

Related papers

Convergence Bound and Critical Batch Size of Muon Optimizer [1.2289361708127877]
We provide convergence proofs for Muon across four practical settings.<n>We show that the addition of weight decay yields strictly tighter theoretical bounds.<n>We derive the critical batch size for Muon that minimizes the computational cost of training.
arXiv Detail & Related papers (2025-07-02T11:03:13Z)
Symmetric Pruning of Large Language Models [61.309982086292756]
Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs.<n>This paper introduces new theoretical insights that redefine the standard minimization objective for pruning.<n>We propose complementary strategies that consider both input activations and weight significance.
arXiv Detail & Related papers (2025-01-31T09:23:06Z)
BoA: Attention-aware Post-training Quantization without Backpropagation [11.096116957844014]
Post-training quantization (PTQ) is a promising solution for deploying large language models (LLMs) on resource-constrained devices.<n>We introduce a novel backpropagation-free PTQ algorithm that optimize integer weights by considering inter-layer dependencies.
arXiv Detail & Related papers (2024-06-19T11:53:21Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems [8.33626757808923]
We introduce Convex Latent-d Adrial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches.
arXiv Detail & Related papers (2023-09-17T12:06:04Z)
Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning [62.40718385934608]
We introduce a novel Pseudo-Negative Regularization (PNR) framework for effective continual self-supervised learning (CSSL) Our PNR leverages pseudo-negatives obtained through model-based augmentation in a way that newly learned representations may not contradict what has been learned in the past.
arXiv Detail & Related papers (2023-06-08T10:59:35Z)
No-Regret Constrained Bayesian Optimization of Noisy and Expensive Hybrid Models using Differentiable Quantile Function Approximations [0.0]
Constrained Upper Quantile Bound (CUQB) is a conceptually simple, deterministic approach that avoids constraint approximations. We show that CUQB significantly outperforms traditional Bayesian optimization in both constrained and unconstrained cases.
arXiv Detail & Related papers (2023-05-05T19:57:36Z)
Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning [13.474603286270836]
We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning. We propose and analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss.
arXiv Detail & Related papers (2021-11-04T21:25:24Z)
Matrix Completion with Model-free Weighting [11.575213244422219]
We propose a novel method for matrix completion under general non-uniform missing structures. We construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities. The recovered matrix on the proposed weighted empirical risk enjoys appealing theoretical guarantees.
arXiv Detail & Related papers (2021-06-09T06:28:20Z)
Jointly Modeling and Clustering Tensors in High Dimensions [6.072664839782975]
We consider the problem of jointly benchmarking and clustering of tensors. We propose an efficient high-maximization algorithm that converges geometrically to a neighborhood that is within statistical precision.
arXiv Detail & Related papers (2021-04-15T21:06:16Z)
Imitation with Neural Density Models [98.34503611309256]
We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Imitation Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator.
arXiv Detail & Related papers (2020-10-19T19:38:36Z)
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches. At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z)
Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation [105.33409035876691]
This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. We design a novel structured tensor low-rank norm tailored to MVSC. We show that the proposed method outperforms state-of-the-art methods to a significant extent.
arXiv Detail & Related papers (2020-04-30T11:52:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.