Related papers: Man versus Machine: AutoML and Human Experts' Role in Phishing Detection

Man versus Machine: AutoML and Human Experts' Role in Phishing Detection

URL: http://arxiv.org/abs/2108.12193v1
Date: Fri, 27 Aug 2021 09:26:20 GMT
Title: Man versus Machine: AutoML and Human Experts' Role in Phishing Detection
Authors: Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha
Abstract summary: This paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets. Our results indicate that AutoML-based models are able to outperform manually developed machine learning models in complex classification tasks.
Score: 4.124446337711138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) has developed rapidly in the past few years and has successfully been utilized for a broad range of tasks, including phishing detection. However, building an effective ML-based detection system is not a trivial task, and requires data scientists with knowledge of the relevant domain. Automated Machine Learning (AutoML) frameworks have received a lot of attention in recent years, enabling non-ML experts in building a machine learning model. This brings to an intriguing question of whether AutoML can outperform the results achieved by human data scientists. Our paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets to see whether AutoML-based models can outperform manually crafted machine learning models. Our results indicate that AutoML-based models are able to outperform manually developed machine learning models in complex classification tasks, specifically in datasets where the features are not quite discriminative, and datasets with overlapping classes or relatively high degrees of non-linearity. Challenges also remain in building a real-world phishing detection system using AutoML frameworks due to the current support only on supervised classification problems, leading to the need for labeled data, and the inability to update the AutoML-based models incrementally. This indicates that experts with knowledge in the domain of phishing and cybersecurity are still essential in the loop of the phishing detection pipeline.

Related papers

Robustness of AutoML on Dirty Categorical Data [10.798536038901903]
The goal of automated machine learning (AutoML) is to reduce trial and error when doing machine learning (ML)<n>Recent research has shown that ML models can benefit from morphological encoders for dirty categorical data, leading to superior predictive performance.<n>We propose a pipeline that transforms categorical data into numerical data so that an AutoML can handle categorical data transformed by more advanced encoding schemes.
arXiv Detail & Related papers (2026-01-31T00:05:59Z)
AutoML in Cybersecurity: An Empirical Study [0.8703011045028926]
This paper systematically evaluates eight open-source AutoML frameworks across 11 publicly available cybersecurity datasets.<n>Results show substantial performance variability across tools and datasets, with no single solution consistently superior.<n>Key challenges identified include adversarial vulnerability, model drift, and inadequate feature engineering.
arXiv Detail & Related papers (2025-09-28T03:52:46Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Position: A Call to Action for a Human-Centered AutoML Paradigm [83.78883610871867]
Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) We argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems.
arXiv Detail & Related papers (2024-06-05T15:05:24Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations. We study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z)
Assessing the Use of AutoML for Data-Driven Software Engineering [10.40771687966477]
AutoML promises to automate the building of end-to-end AI/ML pipelines. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted.
arXiv Detail & Related papers (2023-07-20T11:14:24Z)
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z)
Automatic Componentwise Boosting: An Interpretable AutoML System [1.1709030738577393]
We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets.
arXiv Detail & Related papers (2021-09-12T18:34:33Z)
Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time. The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
Interpret-able feedback for AutoML systems [5.5524559605452595]
Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model. We introduce an interpretable data feedback solution for AutoML.
arXiv Detail & Related papers (2021-02-22T18:54:26Z)
AutoML to Date and Beyond: Challenges and Opportunities [30.60364966752454]
AutoML tools aim to make machine learning accessible for non-machine learning experts. We introduce a new classification system for AutoML systems. We lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline.
arXiv Detail & Related papers (2020-10-21T06:08:21Z)
Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical Evolution [1.5224436211478214]
This paper describes a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines. The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient ClassificationPipeline Evolution (RECIPE)
arXiv Detail & Related papers (2020-04-01T09:31:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.