From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation
- URL: http://arxiv.org/abs/2507.11364v1
- Date: Tue, 15 Jul 2025 14:32:49 GMT
- Title: From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation
- Authors: Kelly Kurowski, Xixi Lu, Hajo A. Reijers,
- Abstract summary: The UNstructured Document REtrieval SyStem (UNDRESS) is a system that uses fuzzy regular expressions, techniques for natural language processing, and large language models to enable RPA platforms to effectively retrieve information from unstructured documents.<n>The results demonstrate the effectiveness of UNDRESS in enhancing RPA capabilities for unstructured data, providing a significant advancement in the field.
- Score: 0.6144680854063939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing volume of unstructured data within organizations poses significant challenges for data analysis and process automation. Unstructured data, which lacks a predefined format, encompasses various forms such as emails, reports, and scans. It is estimated to constitute approximately 80% of enterprise data. Despite the valuable insights it can offer, extracting meaningful information from unstructured data is more complex compared to structured data. Robotic Process Automation (RPA) has gained popularity for automating repetitive tasks, improving efficiency, and reducing errors. However, RPA is traditionally reliant on structured data, limiting its application to processes involving unstructured documents. This study addresses this limitation by developing the UNstructured Document REtrieval SyStem (UNDRESS), a system that uses fuzzy regular expressions, techniques for natural language processing, and large language models to enable RPA platforms to effectively retrieve information from unstructured documents. The research involved the design and development of a prototype system, and its subsequent evaluation based on text extraction and information retrieval performance. The results demonstrate the effectiveness of UNDRESS in enhancing RPA capabilities for unstructured data, providing a significant advancement in the field. The findings suggest that this system could facilitate broader RPA adoption across processes traditionally hindered by unstructured data, thereby improving overall business process efficiency.
Related papers
- WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization [68.46693401421923]
WebShaper systematically formalizes IS tasks through set theory.<n>WebShaper achieves state-of-the-art performance among open-sourced IS agents on GAIA and WebWalkerQA benchmarks.
arXiv Detail & Related papers (2025-07-20T17:53:37Z) - Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text [75.77648333476776]
This paper introduces an automated pipeline for extracting BPMN models from text.<n>A key contribution of this work is the introduction of a newly annotated dataset.<n>We augment the dataset with 15 newly annotated documents containing 32 parallel gateways for model training.
arXiv Detail & Related papers (2025-07-11T07:25:55Z) - Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z) - MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model [1.33134751838052]
This paper introduces the Multidimensional Data Storytelling Framework (MDSF) based on large language models for automated insight generation and context-aware storytelling.<n>The framework incorporates advanced preprocessing techniques, augmented analysis algorithms, and a unique scoring mechanism to identify and prioritize actionable insights.
arXiv Detail & Related papers (2025-01-02T02:35:38Z) - ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing [0.0]
This paper presents ERPA, an innovative Robotic Process Automation (RPA) model designed to enhance ID data extraction and optimize Optical Character Recognition (OCR) tasks within immigration.<n> Benchmark comparisons demonstrate that ERPA significantly reduces processing times by up to 94 percent, completing ID data extraction in just 9.94 seconds.
arXiv Detail & Related papers (2024-12-24T09:44:43Z) - Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study [4.742245127121496]
Structured-GraphRAG is a versatile framework designed to enhance information retrieval across structured datasets in natural language queries.
Our findings show that Structured-GraphRAG significantly improves query processing efficiency and reduces response times.
arXiv Detail & Related papers (2024-09-26T06:53:29Z) - Optimizing Structured Data Processing through Robotic Process Automation [2.3997896447030653]
This study investigates the use of RPA for structured data extraction and evaluates its advantages over manual processes.
By comparing human-performed tasks with those executed by RPA software bots, we assess efficiency and accuracy in data extraction from invoices.
Our findings highlight the significant efficiency gains achieved by RPA, with bots completing tasks in significantly less time compared to manual efforts.
arXiv Detail & Related papers (2024-08-27T05:53:02Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - An Empirical Study of Automatic Post-Editing [56.86393786396992]
APE aims to reduce manual post-editing efforts by automatically correcting errors in machine-translated output.
To alleviate the lack of genuine training data, most of the current APE systems employ data augmentation methods to generate large-scale artificial corpora.
We study the outputs of the state-of-art APE model on a difficult APE dataset to analyze the problems in existing APE systems.
arXiv Detail & Related papers (2022-09-16T07:38:27Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Automatic Business Process Structure Discovery using Ordered Neurons
LSTM: A Preliminary Study [6.6599132213053185]
We propose to retrieve latent semantic hierarchical structure present in business process documents by building a neural network.
We tested the proposed approach on data set of Process Description Documents (PDD) from our practical Robotic Process Automation (RPA) projects.
arXiv Detail & Related papers (2020-01-05T14:19:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.