Learning Robust Policies for Generalized Debris Capture with an
Automated Tether-Net System
- URL: http://arxiv.org/abs/2201.04180v1
- Date: Tue, 11 Jan 2022 20:09:05 GMT
- Title: Learning Robust Policies for Generalized Debris Capture with an
Automated Tether-Net System
- Authors: Chen Zeng, Grant Hecht, Prajit KrisshnaKumar, Raj K. Shah, Souma
Chowdhury and Eleonora M. Botta
- Abstract summary: This paper presents a reinforcement learning framework that integrates a policy optimization approach with net dynamics simulations.
A state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation.
The trained policy demonstrates capture performance close to that obtained with reliability-based optimization run over an individual scenario.
- Score: 2.0429716172112617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tether-net launched from a chaser spacecraft provides a promising method to
capture and dispose of large space debris in orbit. This tether-net system is
subject to several sources of uncertainty in sensing and actuation that affect
the performance of its net launch and closing control. Earlier
reliability-based optimization approaches to design control actions however
remain challenging and computationally prohibitive to generalize over varying
launch scenarios and target (debris) state relative to the chaser. To search
for a general and reliable control policy, this paper presents a reinforcement
learning framework that integrates a proximal policy optimization (PPO2)
approach with net dynamics simulations. The latter allows evaluating the
episodes of net-based target capture, and estimate the capture quality index
that serves as the reward feedback to PPO2. Here, the learned policy is
designed to model the timing of the net closing action based on the state of
the moving net and the target, under any given launch scenario. A stochastic
state transition model is considered in order to incorporate synthetic
uncertainties in state estimation and launch actuation. Along with notable
reward improvement during training, the trained policy demonstrates capture
performance (over a wide range of launch/target scenarios) that is close to
that obtained with reliability-based optimization run over an individual
scenario.
Related papers
- ProSpec RL: Plan Ahead, then Execute [7.028937493640123]
We propose the Prospective (ProSpec) RL method, which makes higher-value, lower-risk optimal decisions by imagining future n-stream trajectories.
ProSpec employs a dynamic model to predict future states based on the current state and a series of sampled actions.
We validate the effectiveness of our method on the DMControl benchmarks, where our approach achieved significant performance improvements.
arXiv Detail & Related papers (2024-07-31T06:04:55Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Practical Probabilistic Model-based Deep Reinforcement Learning by
Integrating Dropout Uncertainty and Trajectory Sampling [7.179313063022576]
This paper addresses the prediction stability, prediction accuracy and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks.
A novel approach dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed.
arXiv Detail & Related papers (2023-09-20T06:39:19Z) - UAV Path Planning Employing MPC- Reinforcement Learning Method for
search and rescue mission [0.0]
We tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments.
We design a Model Predictive Control (MPC) based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm.
arXiv Detail & Related papers (2023-02-21T13:39:40Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Reinforcement Learning for Low-Thrust Trajectory Design of
Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances.
An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted.
The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z) - Accelerating Deep Reinforcement Learning With the Aid of Partial Model:
Energy-Efficient Predictive Video Streaming [97.75330397207742]
Predictive power allocation is conceived for energy-efficient video streaming over mobile networks using deep reinforcement learning.
To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm.
Our simulation results show that the proposed policies converge to the optimal policy that is derived based on perfect large-scale channel prediction.
arXiv Detail & Related papers (2020-03-21T17:36:53Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.