Introducing a Comprehensive, Continuous, and Collaborative Survey of Intrusion Detection Datasets
- URL: http://arxiv.org/abs/2408.02521v1
- Date: Mon, 5 Aug 2024 14:40:41 GMT
- Title: Introducing a Comprehensive, Continuous, and Collaborative Survey of Intrusion Detection Datasets
- Authors: Philipp Bönninghausen, Rafael Uetz, Martin Henze,
- Abstract summary: COMIDDS is an effort to comprehensively survey intrusion detection datasets with an unprecedented level of detail.
It provides structured and critical information on each dataset, including actual data samples and links to relevant publications.
- Score: 2.7082111912355877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Researchers in the highly active field of intrusion detection largely rely on public datasets for their experimental evaluations. However, the large number of existing datasets, the discovery of previously unknown flaws therein, and the frequent publication of new datasets make it hard to select suitable options and sufficiently understand their respective limitations. Hence, there is a great risk of drawing invalid conclusions from experimental results with respect to detection performance of novel methods in the real world. While there exist various surveys on intrusion detection datasets, they have deficiencies in providing researchers with a profound decision basis since they lack comprehensiveness, actionable details, and up-to-dateness. In this paper, we present COMIDDS, an ongoing effort to comprehensively survey intrusion detection datasets with an unprecedented level of detail, implemented as a website backed by a public GitHub repository. COMIDDS allows researchers to quickly identify suitable datasets depending on their requirements and provides structured and critical information on each dataset, including actual data samples and links to relevant publications. COMIDDS is freely accessible, regularly updated, and open to contributions.
Related papers
- Fake News Detection: It's All in the Data! [0.06749750044497731]
The survey meticulously outlines the key features of datasets, various labeling systems employed, and prevalent biases that can impact model performance.
GitHub repository consolidates publicly accessible datasets into a single, user-friendly portal.
arXiv Detail & Related papers (2024-07-02T10:12:06Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment [76.04306818209753]
We introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform.
This dataset comprises approximately two thousand workers, one million tasks, and six million annotations.
We evaluate the effectiveness of several representative truth inference algorithms on this dataset.
arXiv Detail & Related papers (2024-03-10T16:00:41Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Weakly Supervised Anomaly Detection: A Survey [75.26180038443462]
Anomaly detection (AD) is a crucial task in machine learning with various applications.
We present the first comprehensive survey of weakly supervised anomaly detection (WSAD) methods.
For each setting, we provide formal definitions, key algorithms, and potential future directions.
arXiv Detail & Related papers (2023-02-09T10:27:21Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Building Inspection Toolkit: Unified Evaluation and Strong Baselines for
Damage Recognition [0.0]
We introduce the building inspection toolkit -- bikit -- which acts as a simple to use data hub containing relevant open-source datasets in the field of damage recognition.
The datasets are enriched with evaluation splits and predefined metrics, suiting the specific task and their data distribution.
For the sake of compatibility and to motivate researchers in this domain, we also provide a leaderboard and the possibility to share model weights with the community.
arXiv Detail & Related papers (2022-02-14T20:05:59Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Multimedia Datasets for Anomaly Detection: A Survey [0.0]
This paper presents a comprehensive survey on a variety of video, audio, as well as audio-visual datasets based on anomaly detection.
It aims to address the lack of a comprehensive comparison and analysis of multimedia public datasets based on anomaly detection.
arXiv Detail & Related papers (2021-12-10T09:32:21Z) - A Comparative Review of Recent Few-Shot Object Detection Algorithms [0.0]
Few-shot object detection, learning to adapt to the novel classes with a few labeled data, is an imperative and long-lasting problem.
Recent studies have explored how to use implicit cues in extra datasets without target-domain supervision to help few-shot detectors refine robust task notions.
arXiv Detail & Related papers (2021-10-30T07:57:11Z) - A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution
Detection: Solutions and Future Challenges [28.104112546546936]
Machine learning models often encounter samples that are diverged from the training distribution.
Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently.
This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas.
arXiv Detail & Related papers (2021-10-26T22:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.