A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
- URL: http://arxiv.org/abs/2510.08916v1
- Date: Fri, 10 Oct 2025 02:00:56 GMT
- Title: A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
- Authors: Hideaki Kim, Tomoharu Iwata,
- Abstract summary: The representer theorem is a cornerstone of kernel methods, which aim to estimate latent functions in reproducing kernel Hilbert spaces.<n>We show that a novel form of representer theorem emerges: a family of transformed kernels can be defined via a system of simultaneous integral equations.<n>Remarkably, the dual coefficients are all analytically fixed to unity, obviating the need to solve a costly optimization problem to obtain the dual coefficients.
- Score: 31.876688992403647
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The representer theorem is a cornerstone of kernel methods, which aim to estimate latent functions in reproducing kernel Hilbert spaces (RKHSs) in a nonparametric manner. Its significance lies in converting inherently infinite-dimensional optimization problems into finite-dimensional ones over dual coefficients, thereby enabling practical and computationally tractable algorithms. In this paper, we address the problem of estimating the latent triggering kernels--functions that encode the interaction structure between events--for linear multivariate Hawkes processes based on observed event sequences within an RKHS framework. We show that, under the principle of penalized least squares minimization, a novel form of representer theorem emerges: a family of transformed kernels can be defined via a system of simultaneous integral equations, and the optimal estimator of each triggering kernel is expressed as a linear combination of these transformed kernels evaluated at the data points. Remarkably, the dual coefficients are all analytically fixed to unity, obviating the need to solve a costly optimization problem to obtain the dual coefficients. This leads to a highly efficient estimator capable of handling large-scale data more effectively than conventional nonparametric approaches. Empirical evaluations on synthetic datasets reveal that the proposed method attains competitive predictive accuracy while substantially improving computational efficiency over existing state-of-the-art kernel method-based estimators.
Related papers
- Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation [51.65236537605077]
We propose a new type of network compression optimization technique, fully randomized tensor network compression (FCTN)<n>FCTN has significant advantages in correlation characterization and transpositional in algebra, and has notable achievements in multi-dimensional data processing and analysis.<n>We derive efficient algorithms with guarantees to solve the formulated models.
arXiv Detail & Related papers (2026-02-13T14:56:37Z) - Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z) - Accelerated training of Gaussian processes using banded square exponential covariances [0.0]
We propose a novel approach to computationally efficient GP training based on the observation that square-exponential (SE) covariance matrices contain several off-diagonal entries extremely close to zero.<n>We construct a principled procedure to eliminate those entries to produce a emphbanded-matrix approximation to the original covariance, whose inverse and determinant can be computed at a reduced computational cost.
arXiv Detail & Related papers (2026-01-26T22:35:20Z) - Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming [55.848340925419286]
We study online statistical inference for the solutions of quadratic optimization problems with equality and inequality constraints.<n>We develop a sequential programming (SSQP) method to solve these problems, where the step direction is computed by sequentially performing an approximation of the objective and a linear approximation of the constraints.<n>We show that our method global almost moving-average convergence and exhibits local normality with an optimal primal-dual limiting matrix in the sense of Hjek and Le Cam.
arXiv Detail & Related papers (2025-11-27T06:16:17Z) - Nonparametric estimation of Hawkes processes with RKHSs [1.775610745277615]
This paper addresses nonparametric estimation of nonlinear Hawkes processes, where the interaction functions are assumed to lie in a reproducing kernel space (RKHS)<n>Motivated by applications in neuroscience, the model allows complex interaction functions, in order to express exciting and inhibiting effects.<n>We propose an estimation method, that relies on two common approximations (of the ReLU function and of the integral operator)
arXiv Detail & Related papers (2024-11-01T14:26:50Z) - Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations.<n>The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Reconstructing Kernel-based Machine Learning Force Fields with
Super-linear Convergence [0.18416014644193063]
We consider the broad class of Nystr"om-type methods to construct preconditioners.
All considered methods aim to identify a representative subset of inducing ( Kernel) columns to approximate the dominant kernel spectrum.
arXiv Detail & Related papers (2022-12-24T13:45:50Z) - Optimal prediction for kernel-based semi-functional linear regression [5.827901300943599]
We establish minimax optimal rates of convergence for prediction in a semi-functional linear model.
Our results reveal that the smoother functional component can be learned with the minimax rate as if the nonparametric component were known.
arXiv Detail & Related papers (2021-10-29T04:55:44Z) - Jointly Modeling and Clustering Tensors in High Dimensions [6.072664839782975]
We consider the problem of jointly benchmarking and clustering of tensors.
We propose an efficient high-maximization algorithm that converges geometrically to a neighborhood that is within statistical precision.
arXiv Detail & Related papers (2021-04-15T21:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.