Related papers: Universal Sequence Preconditioning

Related papers

Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization [44.278609916888065]
gradient flow learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time.<n>We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix.
arXiv Detail & Related papers (2025-08-28T01:36:42Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications [30.659400341011004]
We provide rigorous equations characterizing the Bayes-optimal learning performance from a number of samples.<n>We mathematically establish predictions obtained via non-rigorous methods from statistical physics.
arXiv Detail & Related papers (2025-03-18T10:36:30Z)
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective [55.15192437680943]
We study the sample complexity of online reinforcement learning for nonlinear dynamical systems with continuous state and action spaces.<n>Our algorithms are likely to be useful in practice, due to their simplicity, the ability to incorporate prior knowledge, and their benign transient behavior.
arXiv Detail & Related papers (2025-01-27T10:01:28Z)
GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning [52.295563400314094]
Second-order methods are widely adopted to improve the convergence rate of learning algorithms.<n>This paper introduces a novel second-order FL framework tailored for wireless channels.
arXiv Detail & Related papers (2024-12-05T04:27:41Z)
Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z)
Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z)
Generative modeling of Sparse Approximate Inverse Preconditioners [7.115143788231333]
We present a new deep learning paradigm for the generation of sparse approximate inverse (SPAI) preconditioners for matrix systems. Our approach is based upon the observation that matrices generated in this manner are not arbitrary, but inherit properties from differential operators that they discretize.
arXiv Detail & Related papers (2024-05-17T10:19:32Z)
In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent. For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z)
Asymmetric matrix sensing by gradient descent with small random initialization [0.8611782340880084]
We study the problem of reconstructing a low-rank matrix from a few linear measurements. Our key contribution is introducing a continuous gradient flow equation that we call the $texted gradient flow$.
arXiv Detail & Related papers (2023-09-04T20:23:35Z)
Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z)
The Decimation Scheme for Symmetric Matrix Factorization [0.0]
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications. We study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced. We introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization.
arXiv Detail & Related papers (2023-07-31T10:53:45Z)
Neural incomplete factorization: learning preconditioners for the conjugate gradient method [2.899792823251184]
We develop a data-driven approach to accelerate the generation of effective preconditioners. We replace the typically hand-engineered preconditioners by the output of graph neural networks. Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF)
arXiv Detail & Related papers (2023-05-25T11:45:46Z)
Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding [21.38845517949153]
This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear systems. We use an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation.
arXiv Detail & Related papers (2023-04-08T04:23:46Z)
Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems. In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z)
Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset. The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function. The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z)
Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables. We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption. We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)
Manifold learning-based polynomial chaos expansions for high-dimensional surrogate models [0.0]
We introduce a manifold learning-based method for uncertainty quantification (UQ) in describing systems. The proposed method is able to achieve highly accurate approximations which ultimately lead to the significant acceleration of UQ tasks.
arXiv Detail & Related papers (2021-07-21T00:24:15Z)
Explainable nonlinear modelling of multiple time series with invertible neural networks [7.605814048051735]
A method for nonlinear topology identification is proposed, based on the assumption that a collection of time series are generated in two steps. The latter mappings are assumed invertible, and are modelled as shallow neural networks, so that their inverse can be numerically evaluated. This paper explains the steps needed to calculate the gradients applying implicit differentiation.
arXiv Detail & Related papers (2021-07-01T12:07:09Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals. We propose a general graph estimator based on a novel structured fusion regularization. We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z)
Online Stochastic Gradient Descent Learns Linear Dynamical Systems from A Single Trajectory [1.52292571922932]
We show that if the unknown weight matrices describing the system are in Brunovsky canonical form, we can efficiently estimate the ground truth unknown of the system. Specifically, by deriving concrete bounds, we show that SGD converges linearly in expectation to any arbitrary small Frobenius norm distance from the ground truth weights.
arXiv Detail & Related papers (2021-02-23T17:48:39Z)
Beyond Procrustes: Balancing-Free Gradient Descent for Asymmetric Low-Rank Matrix Sensing [36.96922859748537]
Low-rank matrix estimation plays a central role in various applications across science and engineering. Existing approaches rely on adding a metric regularization term to balance the scale of the two matrix factors. In this paper, we provide a theoretical justification for the performance in recovering a low-rank matrix from a small number of linear measurements.
arXiv Detail & Related papers (2021-01-13T15:03:52Z)
Sparse Quantized Spectral Clustering [85.77233010209368]
We exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations. We show that very little change occurs in the informative eigenstructure even under drastic sparsification/quantization.
arXiv Detail & Related papers (2020-10-03T15:58:07Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network. We show that our approach is much more robust than explicit differentiation of the eigendecomposition. Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
No-Regret Prediction in Marginally Stable Systems [37.178095559618654]
We consider the problem of online prediction in a marginally stable linear dynamical system. By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting.
arXiv Detail & Related papers (2020-02-06T01:53:34Z)
Regret Minimization in Partially Observable Linear Quadratic Control [91.43582419264763]
We study the problem of regret in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control.
arXiv Detail & Related papers (2020-01-31T22:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.