Related papers: Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures

Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures

URL: http://arxiv.org/abs/2510.06660v1
Date: Wed, 08 Oct 2025 05:20:34 GMT
Title: Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures
Authors: Weiguo Lu, Gangnan Yuan, Hong-kun Zhang, Shangyang Li,
Abstract summary: We introduce a new class of differentiable modules that draw on the universal density approximation Gaussian mixture models (GMMs)<n>By relaxing probabilistic constraints, GMNM can be seamlessly integrated into diverse neural architectures and trained end-to-end methods.<n>Our experiments demonstrate GMNM as a powerful and flexible module for enhancing efficiency and accuracy across a wide range of machine learning applications.
Score: 0.9778425765923312
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Neural networks in general, from MLPs and CNNs to attention-based Transformers, are constructed from layers of linear combinations followed by nonlinear operations such as ReLU, Sigmoid, or Softmax. Despite their strength, these conventional designs are often limited in introducing non-linearity by the choice of activation functions. In this work, we introduce Gaussian Mixture-Inspired Nonlinear Modules (GMNM), a new class of differentiable modules that draw on the universal density approximation Gaussian mixture models (GMMs) and distance properties (metric space) of Gaussian kernal. By relaxing probabilistic constraints and adopting a flexible parameterization of Gaussian projections, GMNM can be seamlessly integrated into diverse neural architectures and trained end-to-end with gradient-based methods. Our experiments demonstrate that incorporating GMNM into architectures such as MLPs, CNNs, attention mechanisms, and LSTMs consistently improves performance over standard baselines. These results highlight GMNM's potential as a powerful and flexible module for enhancing efficiency and accuracy across a wide range of machine learning applications.

Related papers

On Linear Mode Connectivity of Mixture-of-Experts Architectures [1.6747713135100666]
We investigate the phenomenon of linear Mode Connectivity (LMC) in neural networks.<n>LMC is a notable phenomenon in the loss landscapes of neural networks, wherein independently trained models have been to be connected--up to varying symmetries of an algorithm.
arXiv Detail & Related papers (2025-09-14T16:51:41Z)
uGMM-NN: Univariate Gaussian Mixture Model Neural Network [0.0]
uGMM-NN is a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks.<n>We demonstrate that uGMM-NN can achieve competitive discriminative performance compared to conventional multilayer perceptrons.
arXiv Detail & Related papers (2025-09-09T10:13:37Z)
Reparameterized LLM Training via Orthogonal Equivalence Transformation [54.80172809738605]
We present POET, a novel training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons.<n>POET can stably optimize the objective function with improved generalization.<n>We develop efficient approximations that make POET flexible and scalable for training large-scale neural networks.
arXiv Detail & Related papers (2025-06-09T17:59:34Z)
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z)
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.<n>We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.<n>Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Differentiable Neural-Integrated Meshfree Method for Forward and Inverse Modeling of Finite Strain Hyperelasticity [1.290382979353427]
The present study aims to extend the novel physics-informed machine learning approach, specifically the neural-integrated meshfree (NIM) method, to model finite-strain problems. Thanks to the inherent differentiable programming capabilities, NIM can circumvent the need for derivation of Newton-Raphson linearization of the variational form. NIM is applied to identify heterogeneous mechanical properties of hyperelastic materials from strain data, validating its effectiveness in the inverse modeling of nonlinear materials.
arXiv Detail & Related papers (2024-07-15T19:15:18Z)
Equivariant Matrix Function Neural Networks [1.8717045355288808]
We introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. MFNs is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.
arXiv Detail & Related papers (2023-10-16T14:17:00Z)
Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z)
Realization of the Trajectory Propagation in the MM-SQC Dynamics by Using Machine Learning [4.629634111796585]
We apply the supervised machine learning (ML) approach to realize the trajectory-based nonadiabatic dynamics. The proposed idea is proven to be reliable and accurate in the simulations of the dynamics of several site-exciton electron-phonon coupling models.
arXiv Detail & Related papers (2022-07-11T01:23:36Z)
Accurate and efficient Simulation of very high-dimensional Neural Mass Models with distributed-delay Connectome Tensors [0.23453441553817037]
This paper introduces methods that efficiently integrates any high-dimensional Neural Mass Models (NMMs) specified by two essential components. The first is the set of nonlinear Random Differential Equations of the dynamics of each neural mass. The second is the highly sparse three-dimensional Connectome (CT) that encodes the strength of the connections and the delays of information transfer along the axons of each connection.
arXiv Detail & Related papers (2020-09-16T05:55:17Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.