Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator
- URL: http://arxiv.org/abs/2405.02356v1
- Date: Fri, 3 May 2024 02:53:32 GMT
- Title: Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator
- Authors: Xincheng Feng, Guodong Shen, Jianhao Hu, Meng Li, Ngai Wong,
- Abstract summary: nonlinear functions often incur various hardware and compute overheads.
computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity.
This paper proposes a first-of-its-kind universal universal-radix finite-state machine (SMURF)
Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation.
- Score: 9.92828543462075
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.
Related papers
- Distributed Representations Enable Robust Multi-Timescale Symbolic Computation in Neuromorphic Hardware [3.961418890143814]
We describe a single-shot weight learning scheme to embed robust multi-timescale dynamics into attractor-based RSNNs.
We embed finite state machines into the RSNN dynamics by superimposing a symmetric autoassociative weight matrix.
This work introduces a scalable approach to embed robust symbolic computation through recurrent dynamics into neuromorphic hardware.
arXiv Detail & Related papers (2024-05-02T14:11:50Z) - Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time.
We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL.
For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Error Bounds for Learning with Vector-Valued Random Features [2.375038919274297]
This paper provides a comprehensive error analysis of learning with vector-valued random features (RF)
The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting.
arXiv Detail & Related papers (2023-05-26T18:00:08Z) - Bayesian Kernelized Tensor Factorization as Surrogate for Bayesian
Optimization [13.896697187967545]
Kernel optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model.
In this paper, we propose to use Bayesian Factorization (BKTF) as a new surrogate model -- for BO in a $D$-dimensional product space.
BKTF offers a flexible and highly effective approach for characterizing complex functions with uncertainty quantification.
arXiv Detail & Related papers (2023-02-28T12:00:21Z) - Sample Complexity of Kernel-Based Q-Learning [11.32718794195643]
We propose a non-parametric Q-learning algorithm which finds an $epsilon$-optimal policy in an arbitrarily large scale discounted MDP.
To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
arXiv Detail & Related papers (2023-02-01T19:46:25Z) - Simplex Random Features [53.97976744884616]
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
arXiv Detail & Related papers (2023-01-31T18:53:39Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Statistically Meaningful Approximation: a Case Study on Approximating
Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability.
We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.