Adaptive Experimental Design for Intrusion Data Collection
- URL: http://arxiv.org/abs/2310.13224v1
- Date: Fri, 20 Oct 2023 02:02:51 GMT
- Title: Adaptive Experimental Design for Intrusion Data Collection
- Authors: Kate Highnam, Zach Hanif, Ellie Van Vogt, Sonali Parbhoo, Sergio Maffeis, Nicholas R. Jennings,
- Abstract summary: Intrusion research frequently collects data on attack techniques currently employed and their potential symptoms.
These observational studies do not clearly discern the cause-and-effect relationships between the design of the environment and the data recorded.
We present the theory and empirical data on methods that aim to discover such causal relationships efficiently.
- Score: 7.7932470245461865
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Intrusion research frequently collects data on attack techniques currently employed and their potential symptoms. This includes deploying honeypots, logging events from existing devices, employing a red team for a sample attack campaign, or simulating system activity. However, these observational studies do not clearly discern the cause-and-effect relationships between the design of the environment and the data recorded. Neglecting such relationships increases the chance of drawing biased conclusions due to unconsidered factors, such as spurious correlations between features and errors in measurement or classification. In this paper, we present the theory and empirical data on methods that aim to discover such causal relationships efficiently. Our adaptive design (AD) is inspired by the clinical trial community: a variant of a randomized control trial (RCT) to measure how a particular ``treatment'' affects a population. To contrast our method with observational studies and RCT, we run the first controlled and adaptive honeypot deployment study, identifying the causal relationship between an ssh vulnerability and the rate of server exploitation. We demonstrate that our AD method decreases the total time needed to run the deployment by at least 33%, while still confidently stating the impact of our change in the environment. Compared to an analogous honeypot study with a control group, our AD requests 17% fewer honeypots while collecting 19% more attack recordings than an analogous honeypot study with a control group.
Related papers
- Causal Inference from Text: Unveiling Interactions between Variables [20.677407402398405]
Existing methods only account for confounding covariables that affect both treatment and outcome.
This bias arises from insufficient consideration of non-confounding covariables.
In this work, we aim to mitigate the bias by unveiling interactions between different variables.
arXiv Detail & Related papers (2023-11-09T11:29:44Z) - Adaptive Sequential Surveillance with Network and Temporal Dependence [1.7205106391379026]
Strategic test allocation plays a major role in the control of both emerging and existing pandemics.
Infectious disease surveillance presents unique statistical challenges.
We propose an Online Super Learner for adaptive sequential surveillance.
arXiv Detail & Related papers (2022-12-05T17:04:17Z) - The interventional Bayesian Gaussian equivalent score for Bayesian
causal inference with unknown soft interventions [0.0]
In certain settings, such as genomics, we may have data from heterogeneous study conditions, with soft (partial) interventions only pertaining to a subset of the study variables.
We define the interventional BGe score for a mixture of observational and interventional data, where the targets and effects of intervention may be unknown.
arXiv Detail & Related papers (2022-05-05T12:32:08Z) - Causal Effect Estimation using Variational Information Bottleneck [19.6760527269791]
Causal inference is to estimate the causal effect in a causal relationship when intervention is applied.
We propose a method to estimate Causal Effect by using Variational Information Bottleneck (CEVIB)
arXiv Detail & Related papers (2021-10-26T13:46:12Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Amortized Causal Discovery: Learning to Infer Causal Graphs from
Time-Series Data [63.15776078733762]
We propose Amortized Causal Discovery, a novel framework to learn to infer causal relations from time-series data.
We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance.
arXiv Detail & Related papers (2020-06-18T19:59:12Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.