Can recurrent neural networks learn process model structure?
- URL: http://arxiv.org/abs/2212.06430v1
- Date: Tue, 13 Dec 2022 08:40:01 GMT
- Title: Can recurrent neural networks learn process model structure?
- Authors: Jari Peeperkorn and Seppe vanden Broucke and Jochen De Weerdt
- Abstract summary: We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization.
We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data.
We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores.
- Score: 0.2580765958706854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various methods using machine and deep learning have been proposed to tackle
different tasks in predictive process monitoring, forecasting for an ongoing
case e.g. the most likely next event or suffix, its remaining time, or an
outcome-related variable. Recurrent neural networks (RNNs), and more
specifically long short-term memory nets (LSTMs), stand out in terms of
popularity. In this work, we investigate the capabilities of such an LSTM to
actually learn the underlying process model structure of an event log. We
introduce an evaluation framework that combines variant-based resampling and
custom metrics for fitness, precision and generalization. We evaluate 4
hypotheses concerning the learning capabilities of LSTMs, the effect of
overfitting countermeasures, the level of incompleteness in the training set
and the level of parallelism in the underlying process model. We confirm that
LSTMs can struggle to learn process model structure, even with simplistic
process data and in a very lenient setup. Taking the correct anti-overfitting
measures can alleviate the problem. However, these measures did not present
themselves to be optimal when selecting hyperparameters purely on predicting
accuracy. We also found that decreasing the amount of information seen by the
LSTM during training, causes a sharp drop in generalization and precision
scores. In our experiments, we could not identify a relationship between the
extent of parallelism in the model and the generalization capability, but they
do indicate that the process' complexity might have impact.
Related papers
- Multi-Scale Convolutional LSTM with Transfer Learning for Anomaly Detection in Cellular Networks [1.1432909951914676]
This study introduces a novel approach Multi-Scale Convolutional LSTM with Transfer Learning (TL) to detect anomalies in cellular networks.
The model is initially trained from scratch using a publicly available dataset to learn typical network behavior.
We compare the performance of the model trained from scratch with that of the fine-tuned model using TL.
arXiv Detail & Related papers (2024-09-30T17:51:54Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - Continuous time recurrent neural networks: overview and application to
forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations.
We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural
Isometry and Exact Recovery [33.74925020397343]
Deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters.
We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization.
We show that ReLU networks learn simple and sparse models even when the labels are noisy.
arXiv Detail & Related papers (2022-09-30T06:47:15Z) - Go Beyond Multiple Instance Neural Networks: Deep-learning Models based
on Local Pattern Aggregation [0.0]
convolutional neural networks (CNNs) have brought breakthroughs in processing clinical electrocardiograms (ECGs) and speaker-independent speech.
In this paper, we propose local pattern aggregation-based deep-learning models to effectively deal with both problems.
The novel network structure, called LPANet, has cropping and aggregation operations embedded into it.
arXiv Detail & Related papers (2022-05-28T13:18:18Z) - Can deep neural networks learn process model structure? An assessment
framework and analysis [0.2580765958706854]
We propose an evaluation scheme complemented with new fitness, precision, and generalisation metrics.
We apply this framework to several process models with simple control-flow behaviour.
Our results show that, even for such simplistic models, careful tuning of overfitting countermeasures is required.
arXiv Detail & Related papers (2022-02-24T09:44:13Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.