Beyond Rule-based Named Entity Recognition and Relation Extraction for
Process Model Generation from Natural Language Text
- URL: http://arxiv.org/abs/2305.03960v2
- Date: Mon, 7 Aug 2023 06:35:25 GMT
- Title: Beyond Rule-based Named Entity Recognition and Relation Extraction for
Process Model Generation from Natural Language Text
- Authors: Julian Neuberger, Lars Ackermann, Stefan Jablonski
- Abstract summary: We present an extension to an existing pipeline to make it entirely data driven.
We demonstrate the competitiveness of our improved pipeline, which not only eliminates the substantial overhead associated with feature engineering and rule definition.
We propose an extension to the PET dataset that incorporates information about linguistic references and a corresponding method for resolving them.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Process-aware information systems offer extensive advantages to companies,
facilitating planning, operations, and optimization of day-to-day business
activities. However, the time-consuming but required step of designing formal
business process models often hampers the potential of these systems. To
overcome this challenge, automated generation of business process models from
natural language text has emerged as a promising approach to expedite this
step. Generally two crucial subtasks have to be solved: extracting
process-relevant information from natural language and creating the actual
model. Approaches towards the first subtask are rule based methods, highly
optimized for specific domains, but hard to adapt to related applications. To
solve this issue, we present an extension to an existing pipeline, to make it
entirely data driven. We demonstrate the competitiveness of our improved
pipeline, which not only eliminates the substantial overhead associated with
feature engineering and rule definition, but also enables adaptation to
different datasets, entity and relation types, and new domains. Additionally,
the largest available dataset (PET) for the first subtask, contains no
information about linguistic references between mentions of entities in the
process description. Yet, the resolution of these mentions into a single visual
element is essential for high quality process models. We propose an extension
to the PET dataset that incorporates information about linguistic references
and a corresponding method for resolving them. Finally, we provide a detailed
analysis of the inherent challenges in the dataset at hand.
Related papers
- Generative Context Distillation [48.91617280112579]
Generative Context Distillation (GCD) is a lightweight prompt internalization method that employs a joint training approach.
We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios.
arXiv Detail & Related papers (2024-11-24T17:32:20Z) - A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning [0.0]
This research proposes a pipeline to construct high-quality instruction datasets for fine-tuning on specific domains.
By ingesting domain-specific documents, the pipeline generates relevant and contextually appropriate instructions.
As a case study, we apply this approach to the domain of psychiatry, a field requiring specialized knowledge and sensitive handling of patient information.
arXiv Detail & Related papers (2024-08-12T03:52:11Z) - Leveraging Data Augmentation for Process Information Extraction [0.0]
We investigate the application of data augmentation for natural language text data.
Data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text.
arXiv Detail & Related papers (2024-04-11T06:32:03Z) - From Dialogue to Diagram: Task and Relationship Extraction from Natural
Language for Accelerated Business Process Prototyping [0.0]
This paper introduces a contemporary solution, where central to our approach, is the use of dependency parsing and Named Entity Recognition (NER)
We utilize Subject-Verb-Object (SVO) constructs for identifying action relationships and integrate semantic analysis tools, including WordNet, for enriched contextual understanding.
The system adeptly handles data transformation and visualization, converting verbose extracted information into BPMN (Business Process Model and Notation) diagrams.
arXiv Detail & Related papers (2023-12-16T12:35:28Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Information Association for Language Model Updating by Mitigating
LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data.
Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information.
We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - A Case for Business Process-Specific Foundation Models [6.25118865553438]
We argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models.
These models should tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.
arXiv Detail & Related papers (2022-10-26T14:17:47Z) - PET: A new Dataset for Process Extraction from Natural Language Text [15.16406344719132]
We develop the first corpus of business process descriptions annotated with activities, gateways, actors and flow information.
We present our new resource, including a detailed overview of the annotation schema and guidelines, as well as a variety of baselines to benchmark the difficulty and challenges of business process extraction from text.
arXiv Detail & Related papers (2022-03-09T16:33:59Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.