Student Assessment in Cybersecurity Training Automated by Pattern Mining
and Clustering
- URL: http://arxiv.org/abs/2307.10260v1
- Date: Thu, 13 Jul 2023 18:52:58 GMT
- Title: Student Assessment in Cybersecurity Training Automated by Pattern Mining
and Clustering
- Authors: Valdemar \v{S}v\'abensk\'y, Jan Vykopal, Pavel \v{C}eleda, Kristi\'an
Tk\'a\v{c}ik, Daniel Popovi\v{c}
- Abstract summary: This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques.
We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees.
Our results show that data mining methods are suitable for analyzing cybersecurity training data.
- Score: 0.5249805590164902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hands-on cybersecurity training allows students and professionals to practice
various tools and improve their technical skills. The training occurs in an
interactive learning environment that enables completing sophisticated tasks in
full-fledged operating systems, networks, and applications. During the
training, the learning environment allows collecting data about trainees'
interactions with the environment, such as their usage of command-line tools.
These data contain patterns indicative of trainees' learning processes, and
revealing them allows to assess the trainees and provide feedback to help them
learn. However, automated analysis of these data is challenging. The training
tasks feature complex problem-solving, and many different solution approaches
are possible. Moreover, the trainees generate vast amounts of interaction data.
This paper explores a dataset from 18 cybersecurity training sessions using
data mining and machine learning techniques. We employed pattern mining and
clustering to analyze 8834 commands collected from 113 trainees, revealing
their typical behavior, mistakes, solution strategies, and difficult training
stages. Pattern mining proved suitable in capturing timing information and tool
usage frequency. Clustering underlined that many trainees often face the same
issues, which can be addressed by targeted scaffolding. Our results show that
data mining methods are suitable for analyzing cybersecurity training data.
Educational researchers and practitioners can apply these methods in their
contexts to assess trainees, support them, and improve the training design.
Artifacts associated with this research are publicly available.
Related papers
- Dynamic Skill Adaptation for Large Language Models [78.31322532135272]
We present Dynamic Skill Adaptation (DSA), an adaptive and dynamic framework to adapt novel and complex skills to Large Language Models (LLMs)
For every skill, we utilize LLMs to generate both textbook-like data which contains detailed descriptions of skills for pre-training and exercise-like data which targets at explicitly utilizing the skills to solve problems for instruction-tuning.
Experiments on large language models such as LLAMA and Mistral demonstrate the effectiveness of our proposed methods in adapting math reasoning skills and social study skills.
arXiv Detail & Related papers (2024-12-26T22:04:23Z) - Detecting Unsuccessful Students in Cybersecurity Exercises in Two Different Learning Environments [0.37729165787434493]
This paper develops automated tools to predict when a student is having difficulty.
In a potential application, such models can aid instructors in detecting struggling students and providing targeted help.
arXiv Detail & Related papers (2024-08-16T04:57:54Z) - Deep Internal Learning: Deep Learning from a Single Input [88.59966585422914]
In many cases there is value in training a network just from the input at hand.
This is particularly relevant in many signal and image processing problems where training data is scarce and diversity is large.
This survey paper aims at covering deep internal-learning techniques that have been proposed in the past few years for these two important directions.
arXiv Detail & Related papers (2023-12-12T16:48:53Z) - Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form.
We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks.
We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z) - Applications of Educational Data Mining and Learning Analytics on Data
From Cybersecurity Training [0.5735035463793008]
This paper surveys publications that enhance cybersecurity education by leveraging trainee-generated data from interactive learning environments.
We identified and examined 3021 papers, ultimately selecting 35 articles for a detailed review.
Our contribution is a systematic literature review of relevant papers and their categorization according to the collected data, analysis methods, and application contexts.
arXiv Detail & Related papers (2023-07-13T19:05:17Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Toolset for Collecting Shell Commands and Its Application in Hands-on
Cybersecurity Training [0.5735035463793008]
We share the design and implementation of an open-source toolset for logging commands that students execute on Linux machines.
Compared to basic solutions, such as shell history files, the toolset's added value is threefold.
Data are instantly forwarded to central storage in a unified, semi-structured format.
arXiv Detail & Related papers (2021-12-21T11:45:13Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Motivating Learners in Multi-Orchestrator Mobile Edge Learning: A
Stackelberg Game Approach [54.28419430315478]
Mobile Edge Learning enables distributed training of Machine Learning models over heterogeneous edge devices.
In MEL, the training performance deteriorates without the availability of sufficient training data or computing resources.
We propose an incentive mechanism, where we formulate the orchestrators-learners interactions as a 2-round Stackelberg game.
arXiv Detail & Related papers (2021-09-25T17:27:48Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.