Man versus Machine: AutoML and Human Experts' Role in Phishing Detection
- URL: http://arxiv.org/abs/2108.12193v1
- Date: Fri, 27 Aug 2021 09:26:20 GMT
- Title: Man versus Machine: AutoML and Human Experts' Role in Phishing Detection
- Authors: Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha
- Abstract summary: This paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets.
Our results indicate that AutoML-based models are able to outperform manually developed machine learning models in complex classification tasks.
- Score: 4.124446337711138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) has developed rapidly in the past few years and has
successfully been utilized for a broad range of tasks, including phishing
detection. However, building an effective ML-based detection system is not a
trivial task, and requires data scientists with knowledge of the relevant
domain. Automated Machine Learning (AutoML) frameworks have received a lot of
attention in recent years, enabling non-ML experts in building a machine
learning model. This brings to an intriguing question of whether AutoML can
outperform the results achieved by human data scientists. Our paper compares
the performances of six well-known, state-of-the-art AutoML frameworks on ten
different phishing datasets to see whether AutoML-based models can outperform
manually crafted machine learning models. Our results indicate that
AutoML-based models are able to outperform manually developed machine learning
models in complex classification tasks, specifically in datasets where the
features are not quite discriminative, and datasets with overlapping classes or
relatively high degrees of non-linearity. Challenges also remain in building a
real-world phishing detection system using AutoML frameworks due to the current
support only on supervised classification problems, leading to the need for
labeled data, and the inability to update the AutoML-based models
incrementally. This indicates that experts with knowledge in the domain of
phishing and cybersecurity are still essential in the loop of the phishing
detection pipeline.
Related papers
- AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - Position: A Call to Action for a Human-Centered AutoML Paradigm [83.78883610871867]
Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML)
We argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems.
arXiv Detail & Related papers (2024-06-05T15:05:24Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Assessing the Use of AutoML for Data-Driven Software Engineering [10.40771687966477]
AutoML promises to automate the building of end-to-end AI/ML pipelines.
Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted.
arXiv Detail & Related papers (2023-07-20T11:14:24Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - Automatic Componentwise Boosting: An Interpretable AutoML System [1.1709030738577393]
We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm.
Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions.
Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets.
arXiv Detail & Related papers (2021-09-12T18:34:33Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Interpret-able feedback for AutoML systems [5.5524559605452595]
Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts.
A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model.
We introduce an interpretable data feedback solution for AutoML.
arXiv Detail & Related papers (2021-02-22T18:54:26Z) - AutoML to Date and Beyond: Challenges and Opportunities [30.60364966752454]
AutoML tools aim to make machine learning accessible for non-machine learning experts.
We introduce a new classification system for AutoML systems.
We lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline.
arXiv Detail & Related papers (2020-10-21T06:08:21Z) - Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical
Evolution [1.5224436211478214]
This paper describes a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines.
The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient ClassificationPipeline Evolution (RECIPE)
arXiv Detail & Related papers (2020-04-01T09:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.