Related papers: Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction

Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction

URL: http://arxiv.org/abs/2011.03040v1
Date: Thu, 5 Nov 2020 18:58:51 GMT
Title: Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction
Authors: Ethan M. Rudd and Ahmed Abdallah
Abstract summary: We implement a malicious/benign predictor URL based on a transformer architecture that is trained from scratch. We show that in contrast to conventional natural language processing (NLP) transformers, this model requires a different training approach to work well.
Score: 3.660098145214466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine Learning (ML) for information security (InfoSec) utilizes distinct data types and formats which require different treatments during optimization/training on raw data. In this paper, we implement a malicious/benign URL predictor based on a transformer architecture that is trained from scratch. We show that in contrast to conventional natural language processing (NLP) transformers, this model requires a different training approach to work well. Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL data for an auto-regressive task does not readily transfer to malicious/benign prediction but 2) that using an auxiliary auto-regressive loss improves performance when training from scratch. We introduce a method for mixed objective optimization, which dynamically balances contributions from both loss terms so that neither one of them dominates. We show that this method yields performance comparable to that of several top-performing benchmark classifiers.

Related papers

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory [39.021321011792786]
Trajectory prediction is a challenging problem that requires considering interactions among multiple actors. Data-driven approaches have been used to address this complex problem, but they suffer from unreliable predictions under distribution shifts during test time. We propose several online learning methods using regression loss from the ground truth of observed data. Our method surpasses the performance of existing state-of-the-art online learning methods in terms of both prediction accuracy and computational efficiency.
arXiv Detail & Related papers (2024-03-15T06:47:14Z)
Target Variable Engineering [0.0]
We compare the predictive performance of regression models trained to predict numeric targets vs. classifiers trained to predict their binarized counterparts. We find that regression requires significantly more computational effort to converge upon the optimal performance.
arXiv Detail & Related papers (2023-10-13T23:12:21Z)
Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files. We show that our URL transformer model requires a different training approach to reach high performance levels. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Discriminative and Generative Transformer-based Models For Situation Entity Classification [8.029049649310211]
We re-examine the situation entity (SE) classification task with varying amounts of available training data. We exploit a Transformer-based variational autoencoder to encode sentences into a lower dimensional latent space.
arXiv Detail & Related papers (2021-09-15T17:07:07Z)
Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. A linear-complexity recurrent variant has proven well suited for autoregressive generation. This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.