Universal Learning Waveform Selection Strategies for Adaptive Target
Tracking
- URL: http://arxiv.org/abs/2202.05294v1
- Date: Thu, 10 Feb 2022 19:21:03 GMT
- Title: Universal Learning Waveform Selection Strategies for Adaptive Target
Tracking
- Authors: Charles E. Thornton, R. Michael Buehrer, Harpreet S. Dhillon, Anthony
F. Martone
- Abstract summary: This work develops a sequential waveform selection scheme which achieves Bellman optimality in any radar scene.
An algorithm based on a multi-alphabet version of the Context-Tree Weighting (CTW) method can be used to optimally solve a broad class of waveform-agile tracking problems.
- Score: 42.4297040396286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online selection of optimal waveforms for target tracking with active sensors
has long been a problem of interest. Many conventional solutions utilize an
estimation-theoretic interpretation, in which a waveform-specific
Cram\'{e}r-Rao lower bound on measurement error is used to select the optimal
waveform for each tracking step. However, this approach is only valid in the
high SNR regime, and requires a rather restrictive set of assumptions regarding
the target motion and measurement models. Further, due to computational
concerns, many traditional approaches are limited to near-term, or myopic,
optimization, even though radar scenes exhibit strong temporal correlation.
More recently, reinforcement learning has been proposed for waveform selection,
in which the problem is framed as a Markov decision process (MDP), allowing for
long-term planning. However, a major limitation of reinforcement learning is
that the memory length of the underlying Markov process is often unknown for
realistic target and channel dynamics, and a more general framework is
desirable. This work develops a universal sequential waveform selection scheme
which asymptotically achieves Bellman optimality in any radar scene which can
be modeled as a $U^{\text{th}}$ order Markov process for a finite, but unknown,
integer $U$. Our approach is based on well-established tools from the field of
universal source coding, where a stationary source is parsed into variable
length phrases in order to build a context-tree, which is used as a
probabalistic model for the scene's behavior. We show that an algorithm based
on a multi-alphabet version of the Context-Tree Weighting (CTW) method can be
used to optimally solve a broad class of waveform-agile tracking problems while
making minimal assumptions about the environment's behavior.
Related papers
- Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network [7.824710236769593]
We address a joint trajectory planning, user association, resource allocation, and power control problem in the aerial IoT network.
Our framework can incorporate various trajectory planning algorithms such as the genetic, tree search, and reinforcement learning.
arXiv Detail & Related papers (2024-05-02T14:21:29Z) - FlowPG: Action-constrained Policy Gradient with Normalizing Flows [14.98383953401637]
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical resource-alential related decision making problems.
A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each step.
arXiv Detail & Related papers (2024-02-07T11:11:46Z) - One-Dimensional Deep Image Prior for Curve Fitting of S-Parameters from
Electromagnetic Solvers [57.441926088870325]
Deep Image Prior (DIP) is a technique that optimized the weights of a randomly-d convolutional neural network to fit a signal from noisy or under-determined measurements.
Relative to publicly available implementations of Vector Fitting (VF), our method shows superior performance on nearly all test examples.
arXiv Detail & Related papers (2023-06-06T20:28:37Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - An Online Prediction Approach Based on Incremental Support Vector
Machine for Dynamic Multiobjective Optimization [19.336520152294213]
We propose a novel prediction algorithm based on incremental support vector machine (ISVM)
We treat the solving of dynamic multiobjective optimization problems (DMOPs) as an online learning process.
The proposed algorithm can effectively tackle dynamic multiobjective optimization problems.
arXiv Detail & Related papers (2021-02-24T08:51:23Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - A Reinforcement Learning based approach for Multi-target Detection in
Massive MIMO radar [12.982044791524494]
This paper considers the problem of multi-target detection for massive multiple input multiple output (MMIMO) cognitive radar (CR)
We propose a reinforcement learning (RL) based algorithm for cognitive multi-target detection in the presence of unknown disturbance statistics.
Numerical simulations are performed to assess the performance of the proposed RL-based algorithm in both stationary and dynamic environments.
arXiv Detail & Related papers (2020-05-10T16:29:06Z) - A data-driven choice of misfit function for FWI using reinforcement
learning [0.0]
We use a deep-Q network (DQN) to learn an optimal policy to determine the proper timing to switch between different misfit functions.
Specifically, we train the state-action value function (Q) to predict when to use the conventional L2-norm misfit function or the more advanced optimal-transport matching-filter (OTMF) misfit.
arXiv Detail & Related papers (2020-02-08T12:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.