MDPFuzz: Testing Models Solving Markov Decision Processes
- URL: http://arxiv.org/abs/2112.02807v4
- Date: Tue, 11 Apr 2023 22:19:33 GMT
- Title: MDPFuzz: Testing Models Solving Markov Decision Processes
- Authors: Qi Pang, Yuanyuan Yuan, Shuai Wang
- Abstract summary: We present MDPFuzz, the first blackbox fuzz testing framework for models solving Markov decision process (MDP)
MDPFuzz forms testing oracles by checking whether the target model enters abnormal and dangerous states.
We show inspiring findings that crash-triggering states, though they look normal, induce distinct neuron activation patterns compared with normal states.
- Score: 10.53962813929928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Markov decision process (MDP) provides a mathematical framework for
modeling sequential decision-making problems, many of which are crucial to
security and safety, such as autonomous driving and robot control. The rapid
development of artificial intelligence research has created efficient methods
for solving MDPs, such as deep neural networks (DNNs), reinforcement learning
(RL), and imitation learning (IL). However, these popular models solving MDPs
are neither thoroughly tested nor rigorously reliable.
We present MDPFuzz, the first blackbox fuzz testing framework for models
solving MDPs. MDPFuzz forms testing oracles by checking whether the target
model enters abnormal and dangerous states. During fuzzing, MDPFuzz decides
which mutated state to retain by measuring if it can reduce cumulative rewards
or form a new state sequence. We design efficient techniques to quantify the
"freshness" of a state sequence using Gaussian mixture models (GMMs) and
dynamic expectation-maximization (DynEM). We also prioritize states with high
potential of revealing crashes by estimating the local sensitivity of target
models over states.
MDPFuzz is evaluated on five state-of-the-art models for solving MDPs,
including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes
scenarios of autonomous driving, aircraft collision avoidance, and two games
that are often used to benchmark RL. During a 12-hour run, we find over 80
crash-triggering state sequences on each model. We show inspiring findings that
crash-triggering states, though they look normal, induce distinct neuron
activation patterns compared with normal states. We further develop an abnormal
behavior detector to harden all the evaluated models and repair them with the
findings of MDPFuzz to significantly enhance their robustness without
sacrificing accuracy.
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Real-Time Anomaly Detection and Reactive Planning with Large Language Models [18.57162998677491]
Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot capabilities.
We present a two-stage reasoning framework that incorporates the judgement regarding potential anomalies into a safe control framework.
This enables our monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles.
arXiv Detail & Related papers (2024-07-11T17:59:22Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Learning Residual Model of Model Predictive Control via Random Forests
for Autonomous Driving [13.865293598486492]
One major issue in predictive control (MPC) for autonomous driving is the contradiction between the system model's prediction and computation.
This paper reformulates the MPC tracking accuracy as a program (QP) problem optimization as a program (QP) can effectively solve it.
arXiv Detail & Related papers (2023-04-10T03:32:09Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Robust DNN Surrogate Models with Uncertainty Quantification via
Adversarial Training [17.981250443856897]
surrogate models have been used to emulate mathematical simulators for physical or biological processes.
Deep Neural Network (DNN) surrogate models have gained popularity for their hard-to-match emulation accuracy.
In this paper, we show the severity of this issue through empirical studies and hypothesis testing.
arXiv Detail & Related papers (2022-11-10T05:09:39Z) - Improving Variational Autoencoder based Out-of-Distribution Detection
for Embedded Real-time Applications [2.9327503320877457]
Out-of-distribution (OD) detection is an emerging approach to address the challenge of detecting out-of-distribution in real-time.
In this paper, we show how we can robustly detect hazardous motion around autonomous driving agents.
Our methods significantly improve detection capabilities of OoD factors to unique driving scenarios, 42% better than state-of-the-art approaches.
Our model also generalized near-perfectly, 97% better than the state-of-the-art across the real-world and simulation driving data sets experimented.
arXiv Detail & Related papers (2021-07-25T07:52:53Z) - Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.