Related papers: Can recurrent neural networks learn process model structure?

Can recurrent neural networks learn process model structure?

URL: http://arxiv.org/abs/2212.06430v1
Date: Tue, 13 Dec 2022 08:40:01 GMT
Title: Can recurrent neural networks learn process model structure?
Authors: Jari Peeperkorn and Seppe vanden Broucke and Jochen De Weerdt
Abstract summary: We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization. We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data. We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores.
Score: 0.2580765958706854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), stand out in terms of popularity. In this work, we investigate the capabilities of such an LSTM to actually learn the underlying process model structure of an event log. We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization. We evaluate 4 hypotheses concerning the learning capabilities of LSTMs, the effect of overfitting countermeasures, the level of incompleteness in the training set and the level of parallelism in the underlying process model. We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data and in a very lenient setup. Taking the correct anti-overfitting measures can alleviate the problem. However, these measures did not present themselves to be optimal when selecting hyperparameters purely on predicting accuracy. We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores. In our experiments, we could not identify a relationship between the extent of parallelism in the model and the generalization capability, but they do indicate that the process' complexity might have impact.

Related papers

LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Multi-Scale Convolutional LSTM with Transfer Learning for Anomaly Detection in Cellular Networks [1.1432909951914676]
This study introduces a novel approach Multi-Scale Convolutional LSTM with Transfer Learning (TL) to detect anomalies in cellular networks. The model is initially trained from scratch using a publicly available dataset to learn typical network behavior. We compare the performance of the model trained from scratch with that of the fine-tuned model using TL.
arXiv Detail & Related papers (2024-09-30T17:51:54Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN) The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability. We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z)
Continuous time recurrent neural networks: overview and application to forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations. We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Go Beyond Multiple Instance Neural Networks: Deep-learning Models based on Local Pattern Aggregation [0.0]
convolutional neural networks (CNNs) have brought breakthroughs in processing clinical electrocardiograms (ECGs) and speaker-independent speech. In this paper, we propose local pattern aggregation-based deep-learning models to effectively deal with both problems. The novel network structure, called LPANet, has cropping and aggregation operations embedded into it.
arXiv Detail & Related papers (2022-05-28T13:18:18Z)
Can deep neural networks learn process model structure? An assessment framework and analysis [0.2580765958706854]
We propose an evaluation scheme complemented with new fitness, precision, and generalisation metrics. We apply this framework to several process models with simple control-flow behaviour. Our results show that, even for such simplistic models, careful tuning of overfitting countermeasures is required.
arXiv Detail & Related papers (2022-02-24T09:44:13Z)
Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.