Adaptive Experimental Design for Intrusion Data Collection
- URL: http://arxiv.org/abs/2310.13224v1
- Date: Fri, 20 Oct 2023 02:02:51 GMT
- Title: Adaptive Experimental Design for Intrusion Data Collection
- Authors: Kate Highnam, Zach Hanif, Ellie Van Vogt, Sonali Parbhoo, Sergio Maffeis, Nicholas R. Jennings,
- Abstract summary: Intrusion research frequently collects data on attack techniques currently employed and their potential symptoms.
These observational studies do not clearly discern the cause-and-effect relationships between the design of the environment and the data recorded.
We present the theory and empirical data on methods that aim to discover such causal relationships efficiently.
- Score: 7.7932470245461865
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Intrusion research frequently collects data on attack techniques currently employed and their potential symptoms. This includes deploying honeypots, logging events from existing devices, employing a red team for a sample attack campaign, or simulating system activity. However, these observational studies do not clearly discern the cause-and-effect relationships between the design of the environment and the data recorded. Neglecting such relationships increases the chance of drawing biased conclusions due to unconsidered factors, such as spurious correlations between features and errors in measurement or classification. In this paper, we present the theory and empirical data on methods that aim to discover such causal relationships efficiently. Our adaptive design (AD) is inspired by the clinical trial community: a variant of a randomized control trial (RCT) to measure how a particular ``treatment'' affects a population. To contrast our method with observational studies and RCT, we run the first controlled and adaptive honeypot deployment study, identifying the causal relationship between an ssh vulnerability and the rate of server exploitation. We demonstrate that our AD method decreases the total time needed to run the deployment by at least 33%, while still confidently stating the impact of our change in the environment. Compared to an analogous honeypot study with a control group, our AD requests 17% fewer honeypots while collecting 19% more attack recordings than an analogous honeypot study with a control group.
Related papers
- Positive-Unlabeled Learning for Control Group Construction in Observational Causal Inference [3.8266403895736776]
Access to both treated and control units is essential for estimating the effect of a treatment on an outcome of interest.<n>A common challenge is the absence of units clearly labeled as controls-that is, units known not to have received the treatment.<n>We propose positive-unlabeled (PU) learning as a framework for identifying, with high confidence, control units from a pool of unlabeled ones.
arXiv Detail & Related papers (2025-07-19T08:06:08Z) - Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z) - Identifying perturbation targets through causal differential networks [23.568795598997376]
We propose a causality-inspired approach to identify variables responsible for changes to a biological system.
First, we infer noisy causal graphs from the observational and interventional data.
We then learn to map the differences between these graphs, along with additional statistical features, to sets of variables that were intervened upon.
arXiv Detail & Related papers (2024-10-04T12:48:21Z) - Causal Inference from Text: Unveiling Interactions between Variables [20.677407402398405]
Existing methods only account for confounding covariables that affect both treatment and outcome.
This bias arises from insufficient consideration of non-confounding covariables.
In this work, we aim to mitigate the bias by unveiling interactions between different variables.
arXiv Detail & Related papers (2023-11-09T11:29:44Z) - Adaptive Sequential Surveillance with Network and Temporal Dependence [1.7205106391379026]
Strategic test allocation plays a major role in the control of both emerging and existing pandemics.
Infectious disease surveillance presents unique statistical challenges.
We propose an Online Super Learner for adaptive sequential surveillance.
arXiv Detail & Related papers (2022-12-05T17:04:17Z) - The interventional Bayesian Gaussian equivalent score for Bayesian
causal inference with unknown soft interventions [0.0]
In certain settings, such as genomics, we may have data from heterogeneous study conditions, with soft (partial) interventions only pertaining to a subset of the study variables.
We define the interventional BGe score for a mixture of observational and interventional data, where the targets and effects of intervention may be unknown.
arXiv Detail & Related papers (2022-05-05T12:32:08Z) - Causal Effect Estimation using Variational Information Bottleneck [19.6760527269791]
Causal inference is to estimate the causal effect in a causal relationship when intervention is applied.
We propose a method to estimate Causal Effect by using Variational Information Bottleneck (CEVIB)
arXiv Detail & Related papers (2021-10-26T13:46:12Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Amortized Causal Discovery: Learning to Infer Causal Graphs from
Time-Series Data [63.15776078733762]
We propose Amortized Causal Discovery, a novel framework to learn to infer causal relations from time-series data.
We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance.
arXiv Detail & Related papers (2020-06-18T19:59:12Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.