Shellcode_IA32: A Dataset for Automatic Shellcode Generation
- URL: http://arxiv.org/abs/2104.13100v1
- Date: Tue, 27 Apr 2021 10:50:47 GMT
- Title: Shellcode_IA32: A Dataset for Automatic Shellcode Generation
- Authors: Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella,
Bojan Cukic and Samira Shaikh
- Abstract summary: We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability.
We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions.
- Score: 2.609784101826762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We take the first step to address the task of automatically generating
shellcodes, i.e., small pieces of code used as a payload in the exploitation of
a software vulnerability, starting from natural language comments. We assemble
and release a novel dataset (Shellcode_IA32), consisting of challenging but
common assembly instructions with their natural language descriptions. We
experiment with standard methods in neural machine translation (NMT) to
establish baseline performance levels on this task.
Related papers
- NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer.
We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - The Power of Words: Generating PowerShell Attacks from Natural Language [4.593752628215474]
This work explores an uncharted domain in AI code generation using Machine Translation (NMT)
We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically.
arXiv Detail & Related papers (2024-04-19T13:54:34Z) - CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities.
We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution.
We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Can We Generate Shellcodes via Natural Language? An Empirical Study [4.82810058837951]
We propose an approach based on Neural Machine Translation to automatically generate shellcodes.
Shellcode_IA32 consists of 3,200 assembly code snippets of real Linux/x86 shellcodes annotated using natural language.
We show that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.
arXiv Detail & Related papers (2022-02-08T09:57:34Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.