Is $F_1$ Score Suboptimal for Cybersecurity Models? Introducing $C_{score}$, a Cost-Aware Alternative for Model Assessment
- URL: http://arxiv.org/abs/2407.14664v2
- Date: Wed, 31 Jul 2024 15:03:57 GMT
- Title: Is $F_1$ Score Suboptimal for Cybersecurity Models? Introducing $C_{score}$, a Cost-Aware Alternative for Model Assessment
- Authors: Manish Marwah, Asad Narayanan, Stephan Jou, Martin Arlitt, Maria Pospelova,
- Abstract summary: False positives and false negatives are not equal and are application dependent.
In cybersecurity applications, the cost of not detecting an attack is very different from marking a benign activity as an attack.
We propose a new cost-aware metric, $C_score$ based on precision and recall.
- Score: 1.747623282473278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The cost of errors related to machine learning classifiers, namely, false positives and false negatives, are not equal and are application dependent. For example, in cybersecurity applications, the cost of not detecting an attack is very different from marking a benign activity as an attack. Various design choices during machine learning model building, such as hyperparameter tuning and model selection, allow a data scientist to trade-off between these two errors. However, most of the commonly used metrics to evaluate model quality, such as $F_1$ score, which is defined in terms of model precision and recall, treat both these errors equally, making it difficult for users to optimize for the actual cost of these errors. In this paper, we propose a new cost-aware metric, $C_{score}$ based on precision and recall that can replace $F_1$ score for model evaluation and selection. It includes a cost ratio that takes into account the differing costs of handling false positives and false negatives. We derive and characterize the new cost metric, and compare it to $F_1$ score. Further, we use this metric for model thresholding for five cybersecurity related datasets for multiple cost ratios. The results show an average cost savings of 49%.
Related papers
- UniMatch: A Unified User-Item Matching Framework for the Multi-purpose
Merchant Marketing [27.459774494479227]
We present a unified user-item matching framework to simultaneously conduct item recommendation and user targeting with just one model.
Our framework results in significant performance gains in comparison with the state-of-the-art methods, with greatly reduced cost on computing resources and daily maintenance.
arXiv Detail & Related papers (2023-07-19T13:49:35Z) - The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero.
We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$.
We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z) - PL-$k$NN: A Parameterless Nearest Neighbors Classifier [0.24499092754102875]
The $k$-Nearest Neighbors is one of the most effective and straightforward models employed in numerous problems.
This paper proposes a $k$-Nearest Neighbors classifier that bypasses the need to define the value of $k$.
arXiv Detail & Related papers (2022-09-26T12:52:45Z) - Bayesian Target-Vector Optimization for Efficient Parameter
Reconstruction [0.0]
We introduce a target-vector optimization scheme that considers all $K$ contributions of the model function and that is specifically suited for parameter reconstruction problems.
It also enables to determine accurate uncertainty estimates with very few observations of the actual model function.
arXiv Detail & Related papers (2022-02-23T15:13:32Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions [74.00030431081751]
We formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users.
Our method satisfies up to 25.89 percentage points more users compared to strong baseline methods.
arXiv Detail & Related papers (2021-11-01T19:49:35Z) - Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance.
Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action.
We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z) - Inconsistent Few-Shot Relation Classification via Cross-Attentional
Prototype Networks with Contrastive Learning [16.128652726698522]
We propose Prototype Network-based cross-attention contrastive learning (ProtoCACL) to capture the rich mutual interactions between the support set and query set.
Experimental results demonstrate that our ProtoCACL can outperform the state-of-the-art baseline model under both inconsistent $K$ and inconsistent $N$ settings.
arXiv Detail & Related papers (2021-10-13T07:47:13Z) - On the Importance of Adaptive Data Collection for Extremely Imbalanced
Pairwise Tasks [94.23884467360521]
We show that state-of-the art models trained on QQP and WikiQA each have only $2.4%$ average precision when evaluated on realistically imbalanced test data.
By creating balanced training data with more informative negative examples, active learning greatly improves average precision to $32.5%$ on QQP and $20.1%$ on WikiQA.
arXiv Detail & Related papers (2020-10-10T21:56:27Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.