An Uncertainty Principle for Linear Recurrent Neural Networks
- URL: http://arxiv.org/abs/2502.09287v1
- Date: Thu, 13 Feb 2025 13:01:46 GMT
- Title: An Uncertainty Principle for Linear Recurrent Neural Networks
- Authors: Alexandre François, Antonio Orvieto, Francis Bach,
- Abstract summary: We build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past.
We fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants.
The optimal performance highlights an uncertainty principle: the filter has to average values around the $K$-th time step in the past with a range(width) that is proportional to $K/S$.
- Score: 54.13281679205581
- License:
- Abstract: We consider linear recurrent neural networks, which have become a key building block of sequence modeling due to their ability for stable and effective long-range modeling. In this paper, we aim at characterizing this ability on a simple but core copy task, whose goal is to build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past (which we refer to as the shift-$K$ filter), where $K$ is larger than $S$. Using classical signal models and quadratic cost, we fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants. The optimal performance highlights an uncertainty principle: the optimal filter has to average values around the $K$-th time step in the past with a range~(width) that is proportional to $K/S$.
Related papers
- Autonomous Sparse Mean-CVaR Portfolio Optimization [6.358973724565783]
We propose an innovative autonomous sparse mean-CVaR portfolio model, capable of approximating the original model with arbitrary accuracy.
We then propose a proximal alternating linearized minimization algorithm, coupled with a nested fixed-point proximity algorithm (both convergent) to iteratively solve the model.
arXiv Detail & Related papers (2024-05-13T15:16:22Z) - Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency.
We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models.
Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Implicit Maximum a Posteriori Filtering via Adaptive Optimization [4.767884267554628]
We frame the standard Bayesian filtering problem as optimization over a time-varying objective.
We show that our framework results in filters that are effective, robust, and scalable to high-dimensional systems.
arXiv Detail & Related papers (2023-11-17T15:30:44Z) - Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the
Optimization Landscape Around the True Solution [4.7464518249313805]
This work characterizes the effect of depth on the optimization landscape of regression.
We show that, despite their non neurality, deeper models have more desirable optimization.
arXiv Detail & Related papers (2022-07-15T17:11:26Z) - KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model.
Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.