Related papers: Transformers for End-to-End InfoSec Tasks: A Feasibility Study

Transformers for End-to-End InfoSec Tasks: A Feasibility Study

URL: http://arxiv.org/abs/2212.02666v1
Date: Mon, 5 Dec 2022 23:50:46 GMT
Title: Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Authors: Ethan M. Rudd, Mohammad Saidur Rahman and Philip Tully
Abstract summary: We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files. We show that our URL transformer model requires a different training approach to reach high performance levels. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
Score: 6.847381178288385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files - in a novel end-to-end approach, and explore a variety of architectural designs, training regimes, and experimental settings to determine the ingredients necessary for performant detection models. We show that in contrast to conventional transformers trained on more standard NLP-related tasks, our URL transformer model requires a different training approach to reach high performance levels. Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL data for an auto-regressive task does not readily transfer to binary classification of malicious or benign URLs, but 2) that using an auxiliary auto-regressive loss improves performance when training from scratch. We introduce a method for mixed objective optimization, which dynamically balances contributions from both loss terms so that neither one of them dominates. We show that this method yields quantitative evaluation metrics comparable to that of several top-performing benchmark classifiers. Unlike URLs, binary executables contain longer and more distributed sequences of information-rich bytes. To accommodate such lengthy byte sequences, we introduce additional context length into the transformer by providing its self-attention layers with an adaptive span similar to Sukhbaatar et al. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets, but also point out the need for further exploration into model improvements in scalability and compute efficiency.

Related papers

Transformer Meets Twicing: Harnessing Unattended Residual Information [2.1605931466490795]
Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks.<n>While the self-attention mechanism has proven capable of handling complex data patterns, it has been observed that the representational capacity of the attention matrix degrades significantly across transformer layers.<n>We propose the Twicing Attention, a novel attention mechanism that uses kernel twicing procedure in nonparametric regression to alleviate the low-pass behavior of associated NLM smoothing.
arXiv Detail & Related papers (2025-03-02T01:56:35Z)
Coordinate In and Value Out: Training Flow Transformers in Ambient Space [6.911507447184487]
Ambient Space Flow Transformers (ASFT) is a domain-agnostic approach to learn flow matching transformers in ambient space. We introduce a conditionally independent point-wise training objective that enables ASFT to make predictions continuously in coordinate space.
arXiv Detail & Related papers (2024-12-05T01:00:07Z)
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities [56.666806962214565]
We propose to improve transformers of a specific modality with irrelevant data from other modalities. We use an auxiliary transformer trained with data of another modality and construct pathways to connect components of the two models. We observe significant and consistent performance improvements with irrelevant data from other modalities.
arXiv Detail & Related papers (2024-01-25T18:59:58Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Remote Sensing Change Detection With Transformers Trained from Scratch [62.96911491252686]
transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark. We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on four public benchmarks.
arXiv Detail & Related papers (2023-04-13T17:57:54Z)
Transfer learning for conflict and duplicate detection in software requirement pairs [0.5359378066251386]
Consistent and holistic expression of software requirements is important for the success of software projects. In this study, we aim to enhance the efficiency of the software development processes by automatically identifying conflicting and duplicate software requirement specifications. We design a novel transformers-based architecture, SR-BERT, which incorporates Sentence-BERT and Bi-encoders for the conflict and duplicate identification task.
arXiv Detail & Related papers (2023-01-09T22:47:12Z)
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)
Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction [3.660098145214466]
We implement a malicious/benign predictor URL based on a transformer architecture that is trained from scratch. We show that in contrast to conventional natural language processing (NLP) transformers, this model requires a different training approach to work well.
arXiv Detail & Related papers (2020-11-05T18:58:51Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network. We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)
Gradient-Based Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model. We obtain a 4.70 point F1-score improvement over current state-of-the-art models. We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.