Related papers: FlexAct: Why Learn when you can Pick?

FlexAct: Why Learn when you can Pick?

URL: http://arxiv.org/abs/2601.06441v1
Date: Sat, 10 Jan 2026 05:51:25 GMT
Title: FlexAct: Why Learn when you can Pick?
Authors: Ramnath Kumar, Kyle Ritscher, Junmin Judy, Lawrence Liu, Cho-Jui Hsieh,
Abstract summary: We introduce a novel framework that employs the Gumbel-Softmax trick to enable discrete yet differentiable selection.<n>Our method dynamically learns the optimal activation function independently of the input.<n>Experiments on synthetic datasets show that our model consistently selects the most suitable activation function.
Score: 39.92969675794945
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning activation functions has emerged as a promising direction in deep learning, allowing networks to adapt activation mechanisms to task-specific demands. In this work, we introduce a novel framework that employs the Gumbel-Softmax trick to enable discrete yet differentiable selection among a predefined set of activation functions during training. Our method dynamically learns the optimal activation function independently of the input, thereby enhancing both predictive accuracy and architectural flexibility. Experiments on synthetic datasets show that our model consistently selects the most suitable activation function, underscoring its effectiveness. These results connect theoretical advances with practical utility, paving the way for more adaptive and modular neural architectures in complex learning scenarios.

Related papers

SmartMixed: A Two-Phase Training Strategy for Adaptive Activation Function Learning in Neural Networks [0.0]
We introduce SmartMixed, a two-phase training strategy that allows networks to learn optimal per-neuron activation functions.<n>We evaluate SmartMixed on the MNIST dataset using feedforward neural networks of varying depths.
arXiv Detail & Related papers (2025-10-25T22:46:37Z)
The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z)
Efficient Search for Customized Activation Functions with Gradient Descent [42.20716255578699]
Different activation functions work best for different deep learning models. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried.
arXiv Detail & Related papers (2024-08-13T11:27:31Z)
Joint Feature and Differentiable $ k $-NN Graph Learning using Dirichlet Energy [103.74640329539389]
We propose a deep FS method that simultaneously conducts feature selection and differentiable $ k $-NN graph learning. We employ Optimal Transport theory to address the non-differentiability issue of learning $ k $-NN graphs in neural networks. We validate the effectiveness of our model with extensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-21T08:15:55Z)
GELU Activation Function in Deep Learning: A Comprehensive Mathematical Analysis and Performance [2.458437232470188]
We investigate the differentiability, boundedness, stationarity, and smoothness properties of the GELU activation function. Our results demonstrate the superior performance of GELU compared to other activation functions.
arXiv Detail & Related papers (2023-05-20T03:22:43Z)
Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control [46.81433026280051]
We present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems. Our approach showcases high resilience and generalization capabilities by consistently adapting to unseen flight conditions.
arXiv Detail & Related papers (2022-10-23T00:45:05Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data. RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling. Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z)
A Use of Even Activation Functions in Neural Networks [0.35172332086962865]
We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions. We show that using an even activation function in one of the fully connected layers improves neural network performance.
arXiv Detail & Related papers (2020-11-23T20:33:13Z)
Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.