Statistical learning on measures: an application to persistence diagrams
- URL: http://arxiv.org/abs/2303.08456v2
- Date: Wed, 31 May 2023 08:09:26 GMT
- Title: Statistical learning on measures: an application to persistence diagrams
- Authors: Olympio Hacquard (LMO, DATASHAPE), Gilles Blanchard (LMO, DATASHAPE),
Cl\'ement Levrard (LPSM)
- Abstract summary: We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $mathcalX$.
We show that our framework allows more flexibility and diversity in the input data we can deal with.
While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a binary supervised learning classification problem where instead
of having data in a finite-dimensional Euclidean space, we observe measures on
a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1),
\ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is
a label in $\{0, 1\}$. Given a set $\mathcal{F}$ of base-classifiers on
$\mathcal{X}$, we build corresponding classifiers in the space of measures. We
provide upper and lower bounds on the Rademacher complexity of this new class
of classifiers that can be expressed simply in terms of corresponding
quantities for the class $\mathcal{F}$. If the measures $\mu_i$ are uniform
over a finite set, this classification task boils down to a multi-instance
learning problem. However, our approach allows more flexibility and diversity
in the input data we can deal with. While such a framework has many possible
applications, this work strongly emphasizes on classifying data via topological
descriptors called persistence diagrams. These objects are discrete measures on
$\mathbb{R}^2$, where the coordinates of each point correspond to the range of
scales at which a topological feature exists. We will present several
classifiers on measures and show how they can heuristically and theoretically
enable a good classification performance in various settings in the case of
persistence diagrams.
Related papers
- A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$.
For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy.
We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z) - One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - An Approximation Theory for Metric Space-Valued Functions With A View
Towards Deep Learning [25.25903127886586]
We build universal functions approximators of continuous maps between arbitrary Polish metric spaces $mathcalX$ and $mathcalY$.
In particular, we show that the required number of Dirac measures is determined by the structure of $mathcalX$ and $mathcalY$.
arXiv Detail & Related papers (2023-04-24T16:18:22Z) - Simplifying and Understanding State Space Models with Diagonal Linear
RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs.
We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs.
We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Automatic classification of deformable shapes [0.0]
Let $mathcalD$ be a dataset of smooth 3D-surfaces, partitioned into disjoint classes $mathitCL_j$, $j= 1, ldots, k$.
We show how optimized diffeomorphic registration applied to large numbers of pairs $S,S' in mathcalD$ to implement automatic classification on $mathcalD$.
We generate classifiers invariant by rigid motions in $mathbbR3$.
arXiv Detail & Related papers (2022-11-04T15:44:56Z) - Unsupervised Learning under Latent Label Shift [21.508249151557244]
We introduce unsupervised learning under Latent Label Shift (LLS)
We show that our algorithm can leverage domain information to improve state of the art unsupervised classification methods.
arXiv Detail & Related papers (2022-07-26T20:52:53Z) - FriendlyCore: Practical Differentially Private Aggregation [67.04951703461657]
We propose a simple and practical tool $mathsfFriendlyCore$ that takes a set of points $cal D$ from an unrestricted (pseudo) metric space as input.
When $cal D$ has effective diameter $r$, $mathsfFriendlyCore$ returns a "stable" subset $cal D_Gsubseteq cal D$ that includes all points.
$mathsfFriendlyCore$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy
arXiv Detail & Related papers (2021-10-19T17:43:50Z) - Regular Polytope Networks [29.44144177954405]
We argue that a transformation can be fixed with no loss of accuracy and with a reduction in memory usage.
It can also be used to learn stationary and maximally separated embeddings.
We show that the stationarity of the embedding and its maximal separated representation can be theoretically justified.
arXiv Detail & Related papers (2021-03-29T14:11:32Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z) - Neural Bayes: A Generic Parameterization Method for Unsupervised
Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes.
It allows computing statistical quantities that are in general difficult to compute.
We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.