Adaptive exponential power distribution with moving estimator for
nonstationary time series
- URL: http://arxiv.org/abs/2003.02149v2
- Date: Mon, 23 Mar 2020 16:26:16 GMT
- Title: Adaptive exponential power distribution with moving estimator for
nonstationary time series
- Authors: Jarek Duda
- Abstract summary: We will focus on maximum likelihood (ML) adaptive estimation for nonstationary time series.
We focus on such example: $rho(x)propto exp(-|(x-mu)/sigma|kappa/kappa)$ exponential power distribution (EPD) family.
It is tested on daily log-return series for DJIA companies, leading to essentially better log-likelihoods than standard (static) estimation.
- Score: 0.8702432681310399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While standard estimation assumes that all datapoints are from probability
distribution of the same fixed parameters $\theta$, we will focus on maximum
likelihood (ML) adaptive estimation for nonstationary time series: separately
estimating parameters $\theta_T$ for each time $T$ based on the earlier values
$(x_t)_{t<T}$ using (exponential) moving ML estimator $\theta_T=\arg\max_\theta
l_T$ for $l_T=\sum_{t<T} \eta^{T-t} \ln(\rho_\theta (x_t))$ and some
$\eta\in(0,1]$. Computational cost of such moving estimator is generally much
higher as we need to optimize log-likelihood multiple times, however, in many
cases it can be made inexpensive thanks to dependencies. We focus on such
example: $\rho(x)\propto \exp(-|(x-\mu)/\sigma|^\kappa/\kappa)$ exponential
power distribution (EPD) family, which covers wide range of tail behavior like
Gaussian ($\kappa=2$) or Laplace ($\kappa=1$) distribution. It is also
convenient for such adaptive estimation of scale parameter $\sigma$ as its
standard ML estimation is $\sigma^\kappa$ being average $\|x-\mu\|^\kappa$. By
just replacing average with exponential moving average:
$(\sigma_{T+1})^\kappa=\eta(\sigma_T)^\kappa +(1-\eta)|x_T-\mu|^\kappa$ we can
inexpensively make it adaptive. It is tested on daily log-return series for
DJIA companies, leading to essentially better log-likelihoods than standard
(static) estimation, with optimal $\kappa$ tails types varying between
companies. Presented general alternative estimation philosophy provides tools
which might be useful for building better models for analysis of nonstationary
time-series.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance.
Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z) - Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
The paper revisits step-size selection in a general Markovian setting.
A major conclusion is that the choice of $rho =0$ or even $rho1/2$ is justified only in select settings.
arXiv Detail & Related papers (2024-05-28T05:11:05Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Adaptive Student's t-distribution with method of moments moving
estimator for nonstationary time series [0.8702432681310399]
We will focus on recently proposed philosophy of moving estimator.
$F_t=sum_taut (1-eta)t-tau ln(rho_theta (x_tau))$ moving log-likelihood, evolving in time.
Student's t-distribution, popular especially in economical applications, here applied to log-returns of DJIA companies.
arXiv Detail & Related papers (2023-04-06T13:37:27Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Supervised Training of Conditional Monge Maps [107.78770597815242]
Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another.
We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable.
We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
arXiv Detail & Related papers (2022-06-28T19:34:44Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.