On the Simplification of Neural Network Architectures for Predictive Process Monitoring
- URL: http://arxiv.org/abs/2509.17145v1
- Date: Sun, 21 Sep 2025 16:21:45 GMT
- Title: On the Simplification of Neural Network Architectures for Predictive Process Monitoring
- Authors: Amaan Ansari, Lukas Kirchdorfer, Raheleh Hadian,
- Abstract summary: We analyze how reducing model complexity, both in terms of parameter count and architectural depth, impacts predictive performance.<n>We show that shrinking the Transformer model by 85% results in only a 2-3% drop in performance.<n>Overall, our findings suggest that substantial model simplification can preserve predictive accuracy.
- Score: 0.5735035463793009
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predictive Process Monitoring (PPM) aims to forecast the future behavior of ongoing process instances using historical event data, enabling proactive decision-making. While recent advances rely heavily on deep learning models such as LSTMs and Transformers, their high computational cost hinders practical adoption. Prior work has explored data reduction techniques and alternative feature encodings, but the effect of simplifying model architectures themselves remains underexplored. In this paper, we analyze how reducing model complexity, both in terms of parameter count and architectural depth, impacts predictive performance, using two established PPM approaches. Across five diverse event logs, we show that shrinking the Transformer model by 85% results in only a 2-3% drop in performance across various PPM tasks, while the LSTM proves slightly more sensitive, particularly for waiting time prediction. Overall, our findings suggest that substantial model simplification can preserve predictive accuracy, paving the way for more efficient and scalable PPM solutions.
Related papers
- Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z) - Domain Adaptation of LLMs for Process Data [7.611051482274626]
Large Language Models (LLMs) have emerged as a prominent area of interest across various research domains, including Process Mining (PM)<n>This study investigates the direct adaptation of pretrained LLMs to process data without natural language reformulation.<n>More specifically, we focus on parameter-efficient fine-tuning techniques to mitigate the computational overhead typically associated with such models.
arXiv Detail & Related papers (2025-09-03T09:21:35Z) - Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes [5.220940151628735]
This work introduces a novel bi-recurrent model architecture that improves sample efficiency and reduces model parameter count in POMDP scenarios.<n>The proposed model architecture outperforms existing transformer-based, attention-based, and recurrence-based methods by a margin ranging from 87.39% to 482.04% on average.
arXiv Detail & Related papers (2025-05-16T11:54:48Z) - Data-driven tool wear prediction in milling, based on a process-integrated single-sensor approach [1.6574413179773764]
This study explores data-driven methods, in particular deep learning, for tool wear prediction.<n>It investigates the transferability of predictive models using minimal training data, validated across two processes.<n>The ConvNeXt model has an exceptional performance, achieving 99.1% accuracy in identifying tool wear.
arXiv Detail & Related papers (2024-12-27T23:10:32Z) - Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks.<n> scaling them to large graphs is challenging due to the high computational and storage costs.<n>We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks.
We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
arXiv Detail & Related papers (2024-01-11T04:59:44Z) - Perceiver-based CDF Modeling for Time Series Forecasting [25.26713741799865]
We propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data.
Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction.
Experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods.
arXiv Detail & Related papers (2023-10-03T01:13:17Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.