Kernel distance measures for time series, random fields and other
structured data
- URL: http://arxiv.org/abs/2109.14752v1
- Date: Wed, 29 Sep 2021 22:54:17 GMT
- Title: Kernel distance measures for time series, random fields and other
structured data
- Authors: Srinjoy Das, Hrushikesh Mhaskar, Alexander Cloninger
- Abstract summary: kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
- Score: 71.61147615789537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces kdiff, a novel kernel-based measure for estimating
distances between instances of time series, random fields and other forms of
structured data. This measure is based on the idea of matching distributions
that only overlap over a portion of their region of support. Our proposed
measure is inspired by MPdist which has been previously proposed for such
datasets and is constructed using Euclidean metrics, whereas kdiff is
constructed using non-linear kernel distances. Also, kdiff accounts for both
self and cross similarities across the instances and is defined using a lower
quantile of the distance distribution. Comparing the cross similarity to self
similarity allows for measures of similarity that are more robust to noise and
partial occlusions of the relevant signals. Our proposed measure kdiff is a
more general form of the well known kernel-based Maximum Mean Discrepancy (MMD)
distance estimated over the embeddings. Some theoretical results are provided
for separability conditions using kdiff as a distance measure for clustering
and classification problems where the embedding distributions can be modeled as
two component mixtures. Applications are demonstrated for clustering of
synthetic and real-life time series and image data, and the performance of
kdiff is compared to competing distance measures for clustering.
Related papers
- Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning [0.0]
We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity.
We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric.
arXiv Detail & Related papers (2023-06-02T19:51:48Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space.
We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z) - The Exploitation of Distance Distributions for Clustering [3.42658286826597]
In cluster analysis, different properties for distance distributions are judged to be relevant for appropriate distance selection.
By systematically investigating this specification using distribution analysis through a mirrored-density plot, it is shown that multimodal distance distributions are preferable in cluster analysis.
Experiments are performed on several artificial datasets and natural datasets for the task of clustering.
arXiv Detail & Related papers (2021-08-22T06:22:08Z) - Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions.
We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z) - Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions.
In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality.
The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.