From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation
- URL: http://arxiv.org/abs/2507.11364v1
- Date: Tue, 15 Jul 2025 14:32:49 GMT
- Title: From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation
- Authors: Kelly Kurowski, Xixi Lu, Hajo A. Reijers,
- Abstract summary: The UNstructured Document REtrieval SyStem (UNDRESS) is a system that uses fuzzy regular expressions, techniques for natural language processing, and large language models to enable RPA platforms to effectively retrieve information from unstructured documents.<n>The results demonstrate the effectiveness of UNDRESS in enhancing RPA capabilities for unstructured data, providing a significant advancement in the field.
- Score: 0.6144680854063939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing volume of unstructured data within organizations poses significant challenges for data analysis and process automation. Unstructured data, which lacks a predefined format, encompasses various forms such as emails, reports, and scans. It is estimated to constitute approximately 80% of enterprise data. Despite the valuable insights it can offer, extracting meaningful information from unstructured data is more complex compared to structured data. Robotic Process Automation (RPA) has gained popularity for automating repetitive tasks, improving efficiency, and reducing errors. However, RPA is traditionally reliant on structured data, limiting its application to processes involving unstructured documents. This study addresses this limitation by developing the UNstructured Document REtrieval SyStem (UNDRESS), a system that uses fuzzy regular expressions, techniques for natural language processing, and large language models to enable RPA platforms to effectively retrieve information from unstructured documents. The research involved the design and development of a prototype system, and its subsequent evaluation based on text extraction and information retrieval performance. The results demonstrate the effectiveness of UNDRESS in enhancing RPA capabilities for unstructured data, providing a significant advancement in the field. The findings suggest that this system could facilitate broader RPA adoption across processes traditionally hindered by unstructured data, thereby improving overall business process efficiency.
Related papers
- SSA3D: Text-Conditioned Assisted Self-Supervised Framework for Automatic Dental Abutment Design [52.57094737117145]
We propose a Self-supervised assisted automatic abutment design framework (SS$A3$D), which employs a dual-branch architecture with a reconstruction branch and a regression branch.<n>The regression branch then predicts the abutment parameters under supervised learning, which eliminates the separate pre-training and fine-tuning process.<n>It also achieves state-of-the-art performance compared to other methods, significantly improving the accuracy and efficiency of automated abutment design.
arXiv Detail & Related papers (2025-12-12T12:08:05Z) - OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas [57.49565459553627]
We introduce OmniStruct, a benchmark for assessing Large Language Models' capabilities on text-to-structure tasks.<n>We collect high-quality training data via synthetic task generation to facilitate the development of efficient text-to-structure models.<n>Our experiments demonstrate the possibility of fine-tuning much smaller models on synthetic data into universal structured generation models.
arXiv Detail & Related papers (2025-11-23T08:18:12Z) - AI Agent-Driven Framework for Automated Product Knowledge Graph Construction in E-Commerce [0.05882087655172317]
This paper introduces a fully automated, AI agent-driven framework for constructing product knowledge graphs directly from unstructured product descriptions.<n>We evaluate the system on a real-world dataset of air conditioner product descriptions.
arXiv Detail & Related papers (2025-11-14T07:09:13Z) - A Case for Computing on Unstructured Data [6.425984481490725]
We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats.<n>This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption.
arXiv Detail & Related papers (2025-09-18T04:24:41Z) - WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization [68.46693401421923]
WebShaper systematically formalizes IS tasks through set theory.<n>WebShaper achieves state-of-the-art performance among open-sourced IS agents on GAIA and WebWalkerQA benchmarks.
arXiv Detail & Related papers (2025-07-20T17:53:37Z) - Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text [75.77648333476776]
This paper introduces an automated pipeline for extracting BPMN models from text.<n>A key contribution of this work is the introduction of a newly annotated dataset.<n>We augment the dataset with 15 newly annotated documents containing 32 parallel gateways for model training.
arXiv Detail & Related papers (2025-07-11T07:25:55Z) - Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z) - MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model [1.33134751838052]
This paper introduces the Multidimensional Data Storytelling Framework (MDSF) based on large language models for automated insight generation and context-aware storytelling.<n>The framework incorporates advanced preprocessing techniques, augmented analysis algorithms, and a unique scoring mechanism to identify and prioritize actionable insights.
arXiv Detail & Related papers (2025-01-02T02:35:38Z) - ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing [0.0]
This paper presents ERPA, an innovative Robotic Process Automation (RPA) model designed to enhance ID data extraction and optimize Optical Character Recognition (OCR) tasks within immigration.<n> Benchmark comparisons demonstrate that ERPA significantly reduces processing times by up to 94 percent, completing ID data extraction in just 9.94 seconds.
arXiv Detail & Related papers (2024-12-24T09:44:43Z) - Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study [4.742245127121496]
Structured-GraphRAG is a versatile framework designed to enhance information retrieval across structured datasets in natural language queries.
Our findings show that Structured-GraphRAG significantly improves query processing efficiency and reduces response times.
arXiv Detail & Related papers (2024-09-26T06:53:29Z) - Optimizing Structured Data Processing through Robotic Process Automation [2.3997896447030653]
This study investigates the use of RPA for structured data extraction and evaluates its advantages over manual processes.
By comparing human-performed tasks with those executed by RPA software bots, we assess efficiency and accuracy in data extraction from invoices.
Our findings highlight the significant efficiency gains achieved by RPA, with bots completing tasks in significantly less time compared to manual efforts.
arXiv Detail & Related papers (2024-08-27T05:53:02Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - An Empirical Study of Automatic Post-Editing [56.86393786396992]
APE aims to reduce manual post-editing efforts by automatically correcting errors in machine-translated output.
To alleviate the lack of genuine training data, most of the current APE systems employ data augmentation methods to generate large-scale artificial corpora.
We study the outputs of the state-of-art APE model on a difficult APE dataset to analyze the problems in existing APE systems.
arXiv Detail & Related papers (2022-09-16T07:38:27Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Automatic Business Process Structure Discovery using Ordered Neurons
LSTM: A Preliminary Study [6.6599132213053185]
We propose to retrieve latent semantic hierarchical structure present in business process documents by building a neural network.
We tested the proposed approach on data set of Process Description Documents (PDD) from our practical Robotic Process Automation (RPA) projects.
arXiv Detail & Related papers (2020-01-05T14:19:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.