Convergence of Gradient-based MAML in LQR
- URL: http://arxiv.org/abs/2309.06588v2
- Date: Fri, 15 Sep 2023 17:22:59 GMT
- Title: Convergence of Gradient-based MAML in LQR
- Authors: Negin Musavi and Geir E. Dullerud
- Abstract summary: The main objective of this paper is to investigate the local convergence characteristics of Modelagnostic Meta-learning (MAML) when applied to system quadratic optimal (LQR)
The study also presents simple numerical results to demonstrate the convergence in MAML in LQR.
- Score: 1.2328446298523066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main objective of this research paper is to investigate the local
convergence characteristics of Model-agnostic Meta-learning (MAML) when applied
to linear system quadratic optimal control (LQR). MAML and its variations have
become popular techniques for quickly adapting to new tasks by leveraging
previous learning knowledge in areas like regression, classification, and
reinforcement learning. However, its theoretical guarantees remain unknown due
to non-convexity and its structure, making it even more challenging to ensure
stability in the dynamic system setting. This study focuses on exploring MAML
in the LQR setting, providing its local convergence guarantees while
maintaining the stability of the dynamical system. The paper also presents
simple numerical results to demonstrate the convergence properties of MAML in
LQR tasks.
Related papers
- Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation [48.88299242238335]
Cross-Modal Re-identification (CM-ReID) faces challenges due to maintaining a fragmented ecosystem of specialized cloud models.<n>We propose MLLMEmbed-ReID, a unified framework based on a powerful cloud-edge architecture.
arXiv Detail & Related papers (2026-02-13T13:48:08Z) - LILAD: Learning In-context Lyapunov-stable Adaptive Dynamics Models [4.66260462241022]
LILAD is a novel framework for system identification that jointly guarantees stability and adaptability.<n>We evaluate LILAD on benchmark autonomous systems and demonstrate that it outperforms adaptive, robust, and non-adaptive baselines in predictive accuracy.
arXiv Detail & Related papers (2025-11-26T19:20:49Z) - Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning [24.476713156225685]
Value decomposition is a central approach in multi-agent reinforcement learning (MARL)<n>Existing methods either enforce monotonicity constraints, which limit expressive power, or adopt softer surrogates at the cost of algorithmic complexity.<n>We show that unconstrained, non-monotonic factorization reliably recovers IGM-optimal solutions and consistently outperforms monotonic baselines.
arXiv Detail & Related papers (2025-11-12T22:49:35Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey [69.45421620616486]
This work presents the first structured taxonomy and analysis of discrete tokenization methods designed for large language models (LLMs)<n>We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines.<n>We identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints.
arXiv Detail & Related papers (2025-07-21T10:52:14Z) - Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z) - Generative QoE Modeling: A Lightweight Approach for Telecom Networks [6.473372512447993]
This study introduces a lightweight generative modeling framework that balances computational efficiency, interpretability, and predictive accuracy.
By validating the use of Vector Quantization (VQ) as a preprocessing technique, continuous network features are effectively transformed into discrete categorical symbols.
This VQ-HMM pipeline enhances the model's capacity to capture dynamic QoE patterns while supporting probabilistic inference on new and unseen data.
arXiv Detail & Related papers (2025-04-30T06:19:37Z) - Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis [54.550658461477106]
Condition Monitoring (CM) uses Machine Learning (ML) models to identify abnormal and adverse conditions.
This work explores the optimization of network resources for ML-based UAV CM frameworks.
By leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
arXiv Detail & Related papers (2025-02-21T14:36:12Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - Rich-Observation Reinforcement Learning with Continuous Latent Dynamics [43.84391209459658]
We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations.
Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient.
arXiv Detail & Related papers (2024-05-29T17:02:49Z) - Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR [4.787550557970832]
We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-a meta-learning (MAML) approach for the LQR problem.
Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.
arXiv Detail & Related papers (2024-01-25T21:59:52Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal
Difference and Successor Representation [32.80370188601152]
The paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR.
The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments.
arXiv Detail & Related papers (2021-12-30T18:21:53Z) - MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays.
We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function.
We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z) - Performance-Weighed Policy Sampling for Meta-Reinforcement Learning [1.77898701462905]
Enhanced Model-Agnostic Meta-Learning (E-MAML) generates fast convergence of the policy function from a small number of training examples.
E-MAML maintains a set of policy parameters learned in the environment for previous tasks.
We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes.
arXiv Detail & Related papers (2020-12-10T23:08:38Z) - Learning the Linear Quadratic Regulator from Nonlinear Observations [135.66883119468707]
We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR.
In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs.
Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
arXiv Detail & Related papers (2020-10-08T07:02:47Z) - Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability.
We formulate the optimization of MetaVRF as a variational inference problem.
We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.