Can Direct Latent Model Learning Solve Linear Quadratic Gaussian
Control?
- URL: http://arxiv.org/abs/2212.14511v2
- Date: Wed, 13 Mar 2024 17:44:52 GMT
- Title: Can Direct Latent Model Learning Solve Linear Quadratic Gaussian
Control?
- Authors: Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra
- Abstract summary: We study the task of learning state representations from potentially high-dimensional observations.
We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning.
- Score: 75.14973944905216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the task of learning state representations from potentially
high-dimensional observations, with the goal of controlling an unknown
partially observable system. We pursue a direct latent model learning approach,
where a dynamic model in some latent state space is learned by predicting
quantities directly related to planning (e.g., costs) without reconstructing
the observations. In particular, we focus on an intuitive cost-driven state
representation learning method for solving Linear Quadratic Gaussian (LQG)
control, one of the most fundamental partially observable control problems. As
our main results, we establish finite-sample guarantees of finding a
near-optimal state representation function and a near-optimal controller using
the directly learned latent model. To the best of our knowledge, despite
various empirical successes, prior to this work it was unclear if such a
cost-driven latent model learner enjoys finite-sample guarantees. Our work
underscores the value of predicting multi-step costs, an idea that is key to
our theory, and notably also an idea that is known to be empirically valuable
for learning state representations.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - An Information Theoretic Approach to Machine Unlearning [45.600917449314444]
Key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Improving Self-supervised Learning with Automated Unsupervised Outlier
Arbitration [83.29856873525674]
We introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning.
Our method directly generalizes to many mainstream self-supervised learning approaches.
arXiv Detail & Related papers (2021-12-15T14:05:23Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.