Random Feature Models with Learnable Activation Functions
- URL: http://arxiv.org/abs/2411.19468v1
- Date: Fri, 29 Nov 2024 04:38:12 GMT
- Title: Random Feature Models with Learnable Activation Functions
- Authors: Zailin Ma, Jiansheng Yang, Yaodong Yang,
- Abstract summary: We introduce the Random Feature model with Learnable Activation Functions (RFLAF)
RFLAF significantly enhances the expressivity and interpretability of traditional random feature (RF) models.
Our model paves the way for developing more expressive and interpretable frameworks within random feature models.
- Score: 10.908603300691064
- License:
- Abstract: Current random feature models typically rely on fixed activation functions, limiting their ability to capture diverse patterns in data. To address this, we introduce the Random Feature model with Learnable Activation Functions (RFLAF), a novel model that significantly enhances the expressivity and interpretability of traditional random feature (RF) models. We begin by studying the RF model with a single radial basis function, where we discover a new kernel and provide the first theoretical analysis on it. By integrating the basis functions with learnable weights, we show that RFLAF can represent a broad class of random feature models whose activation functions belong in $C_c(\mathbb{R})$. Theoretically, we prove that the model requires only about twice the parameter number compared to a traditional RF model to achieve the significant leap in expressivity. Experimentally, RFLAF demonstrates two key advantages: (1) it performs better across various tasks compared to traditional RF model with the same number of parameters, and (2) the optimized weights offer interpretability, as the learned activation function can be directly inferred from these weights. Our model paves the way for developing more expressive and interpretable frameworks within random feature models.
Related papers
- Ensemble Deep Random Vector Functional Link Neural Network Based on Fuzzy Inference System [0.6437284704257459]
ensemble deep random vector functional link (edRVFL) neural network has demonstrated the ability to address the limitations of conventional artificial neural networks.
We propose a novel edRVFL based on fuzzy inference system (edRVFL-FIS) to enhance the feature learning capabilities of edRVFL.
arXiv Detail & Related papers (2024-06-02T17:01:44Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Extracting Reward Functions from Diffusion Models [7.834479563217133]
Decision-making diffusion models can be trained on lower-quality data, and then be steered with a reward function to generate near-optimal trajectories.
We consider the problem of extracting a reward function by comparing a decision-making diffusion model that models low-reward behavior and one that models high-reward behavior.
We show that our approach generalizes beyond sequential decision-making by learning a reward-like function from two large-scale image generation diffusion models.
arXiv Detail & Related papers (2023-06-01T17:59:12Z) - Precise Asymptotic Analysis of Deep Random Feature Models [37.35013316704277]
We provide exact expressions for the performance of regression by an $L-$layer deep random feature (RF) model.
We characterize the variation of the eigendistribution in different layers of the equivalent Gaussian model.
arXiv Detail & Related papers (2023-02-13T09:30:25Z) - Is Model Ensemble Necessary? Model-based RL via a Single Model with
Lipschitz Regularized Value Function [23.255250192599327]
Probabilistic dynamics model ensemble is widely used in existing model-based reinforcement learning methods.
We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics-induced Bellman operators is.
arXiv Detail & Related papers (2023-02-02T17:27:16Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - An Investigation of Potential Function Designs for Neural CRF [75.79555356970344]
In this paper, we investigate a series of increasingly expressive potential functions for neural CRF models.
Our experiments show that the decomposed quadrilinear potential function based on the vector representations of two neighboring labels and two neighboring words consistently achieves the best performance.
arXiv Detail & Related papers (2020-11-11T07:32:18Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.