Efficient and Interpretable Neural Networks Using Complex Lehmer Transform
- URL: http://arxiv.org/abs/2501.15223v1
- Date: Sat, 25 Jan 2025 14:08:30 GMT
- Title: Efficient and Interpretable Neural Networks Using Complex Lehmer Transform
- Authors: Masoud Ataei, Xiaogang Wang,
- Abstract summary: We propose an efficient and interpretable neural network with a novel activation function called the weighted Lehmer transform.
We analyze the mathematical properties of both real-valued and complex-valued Lehmer activation units.
Empirical evaluations demonstrate that our proposed neural network achieves competitive accuracy on benchmark datasets.
- Score: 11.095723123836965
- License:
- Abstract: We propose an efficient and interpretable neural network with a novel activation function called the weighted Lehmer transform. This new activation function enables adaptive feature selection and extends to the complex domain, capturing phase-sensitive and hierarchical relationships within data. Notably, it provides greater interpretability and transparency compared to existing machine learning models, facilitating a deeper understanding of its functionality and decision-making processes. We analyze the mathematical properties of both real-valued and complex-valued Lehmer activation units and demonstrate their applications in modeling nonlinear interactions. Empirical evaluations demonstrate that our proposed neural network achieves competitive accuracy on benchmark datasets with significantly improved computational efficiency. A single layer of real-valued or complex-valued Lehmer activation units is shown to deliver state-of-the-art performance, balancing efficiency with interpretability.
Related papers
- DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.
We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.
Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Extension of Symmetrized Neural Network Operators with Fractional and Mixed Activation Functions [0.0]
We propose a novel extension to symmetrized neural network operators by incorporating fractional and mixed activation functions.
Our framework introduces a fractional exponent in the activation functions, allowing adaptive non-linear approximations with improved accuracy.
arXiv Detail & Related papers (2025-01-17T14:24:25Z) - A Survey on Kolmogorov-Arnold Network [0.0]
Review explores the theoretical foundations, evolution, applications, and future potential of Kolmogorov-Arnold Networks (KAN)
KANs distinguish themselves from traditional neural networks by using learnable, spline- parameterized functions instead of fixed activation functions.
This paper highlights KAN's role in modern neural architectures and outlines future directions to improve its computational efficiency, interpretability, and scalability in data-intensive applications.
arXiv Detail & Related papers (2024-11-09T05:54:17Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks [0.0]
Local Interaction Basis aims to identify computational features by removing irrelevant activations and interactions.
We evaluate the effectiveness of LIB on modular addition and CIFAR-10 models.
We conclude that LIB is a promising theory-driven approach for analyzing neural networks, but in its current form is not applicable to large language models.
arXiv Detail & Related papers (2024-05-17T17:27:19Z) - Peridynamic Neural Operators: A Data-Driven Nonlocal Constitutive Model
for Complex Material Responses [12.454290779121383]
We introduce a novel integral neural operator architecture called the Peridynamic Neural Operator (PNO) that learns a nonlocal law from data.
This neural operator provides a forward model in the form of state-based peridynamics, with objectivity and momentum balance laws automatically guaranteed.
We show that, owing to its ability to capture complex responses, our learned neural operator achieves improved accuracy and efficiency compared to baseline models.
arXiv Detail & Related papers (2024-01-11T17:37:20Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.
LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.
Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence.
We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN)
We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Structure-Regularized Attention for Deformable Object Representation [17.120035855774344]
Capturing contextual dependencies has proven useful to improve the representational power of deep neural networks.
Recent approaches that focus on modeling global context, such as self-attention and non-local operation, achieve this goal by enabling unconstrained pairwise interactions between elements.
We consider learning representations for deformable objects which can benefit from context exploitation by modeling the structural dependencies that the data intrinsically possesses.
arXiv Detail & Related papers (2021-06-12T03:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.