ThreatCluster: Threat Clustering for Information Overload Reduction in Computer Emergency Response Teams
- URL: http://arxiv.org/abs/2210.14067v2
- Date: Fri, 15 Mar 2024 15:46:49 GMT
- Title: ThreatCluster: Threat Clustering for Information Overload Reduction in Computer Emergency Response Teams
- Authors: Philipp Kuehn, Dilara Nadermahmoodi, Moritz Kerk, Christian Reuter,
- Abstract summary: Threats and diversity of information sources pose challenges for CERTs.
To respond to emerging threats, CERTs must gather information in a timely and comprehensive manner.
This paper contributes to the question of how to reduce information overload for CERTs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ever-increasing number of threats and the existing diversity of information sources pose challenges for Computer Emergency Response Teams (CERTs). To respond to emerging threats, CERTs must gather information in a timely and comprehensive manner. But the volume of sources and information leads to information overload. This paper contributes to the question of how to reduce information overload for CERTs. We propose clustering incoming information as scanning this information is one of the most tiresome, but necessary, manual steps. Based on current studies, we establish conditions for such a framework. Different types of evaluation metrics are used and selected in relation to the framework conditions. Furthermore, different document embeddings and distance measures are evaluated and interpreted in combination with clustering methods. We use three different corpora for the evaluation, a novel ground truth corpus based on threat reports, one security bug report (SBR) corpus, and one with news articles. Our work shows, it is possible to reduce the information overload by up to 84.8% with homogeneous clusters. A runtime analysis of the clustering methods strengthens the decision of selected clustering methods. The source code and dataset will be made publicly available after acceptance.
Related papers
- The importance of the clustering model to detect new types of intrusion in data traffic [0.0]
The presented work use K-means algorithm, which is a popular clustering technique.
Data was gathered utilizing Kali Linux environment, cicflowmeter traffic, and Putty Software tools.
The model counted the attacks and assigned numbers to each one of them.
arXiv Detail & Related papers (2024-11-21T19:40:31Z) - Intelligent Multi-Document Summarisation for Extracting Insights on Racial Inequalities from Maternity Incident Investigation Reports [0.8609957371651683]
In healthcare, thousands of safety incidents occur every year, but learning from these incidents is not effectively aggregated.
This paper presents I-SIRch:CS, a framework designed to facilitate the aggregation and analysis of safety incident reports.
The framework integrates concept annotation using the Safety Intelligence Research (SIRch) taxonomy with clustering, summarisation, and analysis capabilities.
arXiv Detail & Related papers (2024-07-11T09:11:20Z) - Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.
To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm.
The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z) - ThreatCrawl: A BERT-based Focused Crawler for the Cybersecurity Domain [0.0]
A new focused crawler is proposed called ThreatCrawl.
It uses BiBERT-based models to classify documents and adapt its crawling path dynamically.
It yields harvest rates of up to 52%, which are, to the best of our knowledge, better than the current state of the art.
arXiv Detail & Related papers (2023-04-24T09:53:33Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - MQAG: Multiple-choice Question Answering and Generation for Assessing
Information Consistency in Summarization [55.60306377044225]
State-of-the-art summarization systems can generate highly fluent summaries.
These summaries, however, may contain factual inconsistencies and/or information not present in the source.
We introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared.
arXiv Detail & Related papers (2023-01-28T23:08:25Z) - A Proposition-Level Clustering Approach for Multi-Document Summarization [82.4616498914049]
We revisit the clustering approach, grouping together propositions for more precise information alignment.
Our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions.
Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.
arXiv Detail & Related papers (2021-12-16T10:34:22Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - A Quantitative Metric for Privacy Leakage in Federated Learning [22.968763654455298]
We propose a quantitative metric based on mutual information for clients to evaluate the potential risk of information leakage in their gradients.
It is proven that, the risk of information leakage is related to the status of the task model, as well as the inherent data distribution.
arXiv Detail & Related papers (2021-02-24T02:48:35Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.