Self-Supervised Contrastive Pre-Training for Multivariate Point
Processes
- URL: http://arxiv.org/abs/2402.00987v1
- Date: Thu, 1 Feb 2024 20:05:04 GMT
- Title: Self-Supervised Contrastive Pre-Training for Multivariate Point
Processes
- Authors: Xiao Shou, Dharmashankar Subramanian, Debarun Bhattacharjya, Tian Gao,
Kristin P. Bennet
- Abstract summary: We introduce a new paradigm for self-supervised learning for multivariate point processes using a transformer encoder.
Specifically, we design a novel pre-training strategy for the encoder where we not only mask random event epochs but also insert randomly sampled "void" epochs where an event does not occur.
We demonstrate the effectiveness of our proposed paradigm on the next-event prediction task using synthetic datasets and 3 real applications.
- Score: 38.898053582052725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervision is one of the hallmarks of representation learning in the
increasingly popular suite of foundation models including large language models
such as BERT and GPT-3, but it has not been pursued in the context of
multivariate event streams, to the best of our knowledge. We introduce a new
paradigm for self-supervised learning for multivariate point processes using a
transformer encoder. Specifically, we design a novel pre-training strategy for
the encoder where we not only mask random event epochs but also insert randomly
sampled "void" epochs where an event does not occur; this differs from the
typical discrete-time pretext tasks such as word-masking in BERT but expands
the effectiveness of masking to better capture continuous-time dynamics. To
improve downstream tasks, we introduce a contrasting module that compares real
events to simulated void instances. The pre-trained model can subsequently be
fine-tuned on a potentially much smaller event dataset, similar conceptually to
the typical transfer of popular pre-trained language models. We demonstrate the
effectiveness of our proposed paradigm on the next-event prediction task using
synthetic datasets and 3 real applications, observing a relative performance
boost of as high as up to 20% compared to state-of-the-art models.
Related papers
- Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition [5.575078692353885]
We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy.
By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously.
arXiv Detail & Related papers (2024-10-23T11:06:36Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Ti-MAE: Self-Supervised Masked Time Series Autoencoders [16.98069693152999]
We propose a novel framework named Ti-MAE, in which the input time series are assumed to follow an integrate distribution.
Ti-MAE randomly masks out embedded time series data and learns an autoencoder to reconstruct them at the point-level.
Experiments on several public real-world datasets demonstrate that our framework of masked autoencoding could learn strong representations directly from the raw data.
arXiv Detail & Related papers (2023-01-21T03:20:23Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z) - The Effectiveness of Discretization in Forecasting: An Empirical Study
on Neural Time Series Models [15.281725756608981]
We investigate the effect of data input and output transformations on the predictive performance of neural forecasting architectures.
We find that binning almost always improves performance compared to using normalized real-valued inputs.
arXiv Detail & Related papers (2020-05-20T15:09:28Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - A Multi-Channel Neural Graphical Event Model with Negative Evidence [76.51278722190607]
Event datasets are sequences of events of various types occurring irregularly over the time-line.
We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions.
arXiv Detail & Related papers (2020-02-21T23:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.