Transformers for End-to-End InfoSec Tasks: A Feasibility Study
- URL: http://arxiv.org/abs/2212.02666v1
- Date: Mon, 5 Dec 2022 23:50:46 GMT
- Title: Transformers for End-to-End InfoSec Tasks: A Feasibility Study
- Authors: Ethan M. Rudd, Mohammad Saidur Rahman and Philip Tully
- Abstract summary: We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files.
We show that our URL transformer model requires a different training approach to reach high performance levels.
We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
- Score: 6.847381178288385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we assess the viability of transformer models in end-to-end
InfoSec settings, in which no intermediate feature representations or
processing steps occur outside the model. We implement transformer models for
two distinct InfoSec data formats - specifically URLs and PE files - in a novel
end-to-end approach, and explore a variety of architectural designs, training
regimes, and experimental settings to determine the ingredients necessary for
performant detection models. We show that in contrast to conventional
transformers trained on more standard NLP-related tasks, our URL transformer
model requires a different training approach to reach high performance levels.
Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL
data for an auto-regressive task does not readily transfer to binary
classification of malicious or benign URLs, but 2) that using an auxiliary
auto-regressive loss improves performance when training from scratch. We
introduce a method for mixed objective optimization, which dynamically balances
contributions from both loss terms so that neither one of them dominates. We
show that this method yields quantitative evaluation metrics comparable to that
of several top-performing benchmark classifiers. Unlike URLs, binary
executables contain longer and more distributed sequences of information-rich
bytes. To accommodate such lengthy byte sequences, we introduce additional
context length into the transformer by providing its self-attention layers with
an adaptive span similar to Sukhbaatar et al. We demonstrate that this approach
performs comparably to well-established malware detection models on benchmark
PE file datasets, but also point out the need for further exploration into
model improvements in scalability and compute efficiency.
Related papers
- Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities [56.666806962214565]
We propose to improve transformers of a specific modality with irrelevant data from other modalities.
We use an auxiliary transformer trained with data of another modality and construct pathways to connect components of the two models.
We observe significant and consistent performance improvements with irrelevant data from other modalities.
arXiv Detail & Related papers (2024-01-25T18:59:58Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Remote Sensing Change Detection With Transformers Trained from Scratch [62.96911491252686]
transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.
We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on four public benchmarks.
arXiv Detail & Related papers (2023-04-13T17:57:54Z) - Transfer learning for conflict and duplicate detection in software requirement pairs [0.5359378066251386]
Consistent and holistic expression of software requirements is important for the success of software projects.
In this study, we aim to enhance the efficiency of the software development processes by automatically identifying conflicting and duplicate software requirement specifications.
We design a novel transformers-based architecture, SR-BERT, which incorporates Sentence-BERT and Bi-encoders for the conflict and duplicate identification task.
arXiv Detail & Related papers (2023-01-09T22:47:12Z) - Improving Transformer-Kernel Ranking Model Using Conformer and Query
Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark.
A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences.
In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z) - Training Transformers for Information Security Tasks: A Case Study on
Malicious URL Prediction [3.660098145214466]
We implement a malicious/benign predictor URL based on a transformer architecture that is trained from scratch.
We show that in contrast to conventional natural language processing (NLP) transformers, this model requires a different training approach to work well.
arXiv Detail & Related papers (2020-11-05T18:58:51Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z) - Gradient-Based Adversarial Training on Transformer Networks for
Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model.
We obtain a 4.70 point F1-score improvement over current state-of-the-art models.
We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.