Constructing Flow Graphs from Procedural Cybersecurity Texts
- URL: http://arxiv.org/abs/2105.14357v1
- Date: Sat, 29 May 2021 19:06:35 GMT
- Title: Constructing Flow Graphs from Procedural Cybersecurity Texts
- Authors: Kuntal Kumar Pal, Kazuaki Kashihara, Pratyay Banerjee, Swaroop Mishra,
Ruoyu Wang, Chitta Baral
- Abstract summary: We build a large annotated procedural text dataset (CTFW) in the cybersecurity domain (3154 documents)
We propose to identify relevant information from such texts and generate information flows between sentences.
Our experiments show that Graph Convolution Network with BERT sentence embeddings outperforms BERT in all three domains.
- Score: 16.09313316086535
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Following procedural texts written in natural languages is challenging. We
must read the whole text to identify the relevant information or identify the
instruction flows to complete a task, which is prone to failures. If such texts
are structured, we can readily visualize instruction-flows, reason or infer a
particular step, or even build automated systems to help novice agents achieve
a goal. However, this structure recovery task is a challenge because of such
texts' diverse nature. This paper proposes to identify relevant information
from such texts and generate information flows between sentences. We built a
large annotated procedural text dataset (CTFW) in the cybersecurity domain
(3154 documents). This dataset contains valuable instructions regarding
software vulnerability analysis experiences. We performed extensive experiments
on CTFW with our LM-GNN model variants in multiple settings. To show the
generalizability of both this task and our method, we also experimented with
procedural texts from two other domains (Maintenance Manual and Cooking), which
are substantially different from cybersecurity. Our experiments show that Graph
Convolution Network with BERT sentence embeddings outperforms BERT in all three
domains
Related papers
- Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction [36.915250638481986]
We introduce LiveSum, a new benchmark dataset for generating summary tables of competitions based on real-time commentary texts.
We evaluate the performances of state-of-the-art Large Language Models on this task in both fine-tuning and zero-shot settings.
We additionally propose a novel pipeline called $T3$(Text-Tuple-Table) to improve their performances.
arXiv Detail & Related papers (2024-04-22T14:31:28Z) - OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition [79.852642726105]
We propose a unified paradigm for parsing visually-situated text across diverse scenarios.
Specifically, we devise a universal model, called Omni, which can simultaneously handle three typical visually-situated text parsing tasks.
In Omni, all tasks share the unified encoder-decoder architecture, the unified objective point-conditioned text generation, and the unified input representation.
arXiv Detail & Related papers (2024-03-28T03:51:14Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - One Embedder, Any Task: Instruction-Finetuned Text Embeddings [105.82772523968961]
INSTRUCTOR is a new method for computing text embeddings given task instructions.
Every text input is embedded together with instructions explaining the use case.
We evaluate INSTRUCTOR on 70 embedding evaluation tasks.
arXiv Detail & Related papers (2022-12-19T18:57:05Z) - Informative Text Generation from Knowledge Triples [56.939571343797304]
We propose a novel memory augmented generator that employs a memory network to memorize the useful knowledge learned during the training.
We derive a dataset from WebNLG for our new setting and conduct extensive experiments to investigate the effectiveness of our model.
arXiv Detail & Related papers (2022-09-26T14:35:57Z) - Zero-Shot Information Extraction as a Unified Text-to-Triple Translation [56.01830747416606]
We cast a suite of information extraction tasks into a text-to-triple translation framework.
We formalize the task as a translation between task-specific input text and output triples.
We study the zero-shot performance of this framework on open information extraction.
arXiv Detail & Related papers (2021-09-23T06:54:19Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.