Preventing Conflicting Gradients in Neural Marked Temporal Point Processes
- URL: http://arxiv.org/abs/2412.08590v1
- Date: Wed, 11 Dec 2024 18:10:04 GMT
- Title: Preventing Conflicting Gradients in Neural Marked Temporal Point Processes
- Authors: Tanguy Bosser, Souhaib Ben Taieb,
- Abstract summary: Neural Marked Temporal Point Processes (MTPP) are flexible models to capture complex temporal inter-dependencies between labeled events.
We show that learning a MTPP model can be framed as a two-task learning problem, where both tasks share a common set of trainable parameters that are jointly optimized.
We introduce novel parametrizations for neural MTPP models that allow for separate modeling and training of each task, effectively avoiding the problem of conflicting gradients.
- Score: 2.3020018305241337
- License:
- Abstract: Neural Marked Temporal Point Processes (MTPP) are flexible models to capture complex temporal inter-dependencies between labeled events. These models inherently learn two predictive distributions: one for the arrival times of events and another for the types of events, also known as marks. In this study, we demonstrate that learning a MTPP model can be framed as a two-task learning problem, where both tasks share a common set of trainable parameters that are optimized jointly. We show that this often leads to the emergence of conflicting gradients during training, where task-specific gradients are pointing in opposite directions. When such conflicts arise, following the average gradient can be detrimental to the learning of each individual tasks, resulting in overall degraded performance. To overcome this issue, we introduce novel parametrizations for neural MTPP models that allow for separate modeling and training of each task, effectively avoiding the problem of conflicting gradients. Through experiments on multiple real-world event sequence datasets, we demonstrate the benefits of our framework compared to the original model formulations.
Related papers
- Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Decoupled Marked Temporal Point Process using Neural Ordinary Differential Equations [14.828081841581296]
A Marked Temporal Point Process (MTPP) is a process whose realization is a set of event-time data.
Recent studies have utilized deep neural networks to capture complex temporal dependencies of events.
We propose a Decoupled MTPP framework that disentangles characterization of a process into a set of evolving influences from different events.
arXiv Detail & Related papers (2024-06-10T10:15:32Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Enhancing Asynchronous Time Series Forecasting with Contrastive
Relational Inference [21.51753838306655]
Temporal point processes(TPPs) are the standard method for modeling such.
Existing TPP models have focused on the conditional distribution of future events instead of explicitly modeling event interactions, imposing challenges for event predictions.
We propose a novel approach that leverages a Neural Inference (NRI) to learn a graph that infers interactions while simultaneously learning dynamics patterns from observational data.
arXiv Detail & Related papers (2023-09-06T09:47:03Z) - Learning Behavior Representations Through Multi-Timescale Bootstrapping [8.543808476554695]
We introduce Bootstrap Across Multiple Scales (BAMS), a multi-scale representation learning model for behavior.
We first apply our method on a dataset of quadrupeds navigating in different terrain types, and show that our model captures the temporal complexity of behavior.
arXiv Detail & Related papers (2022-06-14T17:57:55Z) - CEP3: Community Event Prediction with Neural Point Process on Graph [59.434777403325604]
We propose a novel model combining Graph Neural Networks and Marked Temporal Point Process (MTPP)
Our experiments demonstrate the superior performance of our model in terms of both model accuracy and training efficiency.
arXiv Detail & Related papers (2022-05-21T15:30:25Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Aligned Multi-Task Gaussian Process [12.903751268469696]
Multi-task learning requires accurate identification of the correlations between tasks.
Traditional multi-task models do not account for this and subsequent errors in correlation estimation will result in poor predictive performance.
We introduce a method that automatically accounts for temporal misalignment in a unified generative model that improves predictive performance.
arXiv Detail & Related papers (2021-10-29T13:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.