Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions
- URL: http://arxiv.org/abs/2107.03354v1
- Date: Wed, 7 Jul 2021 16:59:14 GMT
- Title: Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions
- Authors: Tianbo Li, Tianze Luo, Yiping Ke, Sinno Jialin Pan
- Abstract summary: We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
- Score: 50.674773358075015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attributed event sequences are commonly encountered in practice. A recent
research line focuses on incorporating neural networks with the statistical
model -- marked point processes, which is the conventional tool for dealing
with attributed event sequences. Neural marked point processes possess good
interpretability of probabilistic models as well as the representational power
of neural networks. However, we find that performance of neural marked point
processes is not always increasing as the network architecture becomes more
complicated and larger, which is what we call the performance saturation
phenomenon. This is due to the fact that the generalization error of neural
marked point processes is determined by both the network representational
ability and the model specification at the same time. Therefore we can draw two
major conclusions: first, simple network structures can perform no worse than
complicated ones for some cases; second, using a proper probabilistic
assumption is as equally, if not more, important as improving the complexity of
the network. Based on this observation, we propose a simple graph-based network
structure called GCHP, which utilizes only graph convolutional layers, thus it
can be easily accelerated by the parallel mechanism. We directly consider the
distribution of interarrival times instead of imposing a specific assumption on
the conditional intensity function, and propose to use a likelihood ratio loss
with a moment matching mechanism for optimization and model selection.
Experimental results show that GCHP can significantly reduce training time and
the likelihood ratio loss with interarrival time probability assumptions can
greatly improve the model performance.
Related papers
- Learning local discrete features in explainable-by-design convolutional neural networks [0.0]
We introduce an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism.
The model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections.
By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels.
arXiv Detail & Related papers (2024-10-31T18:39:41Z) - Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Robust Generalization of Quadratic Neural Networks via Function
Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution.
We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.