Covariance-engaged Classification of Sets via Linear Programming
- URL: http://arxiv.org/abs/2006.14831v1
- Date: Fri, 26 Jun 2020 07:20:15 GMT
- Title: Covariance-engaged Classification of Sets via Linear Programming
- Authors: Zhao Ren and Sungkyu Jung and Xingye Qiao
- Abstract summary: Set classification aims to classify a set of observations as a whole, as opposed to classifying individual observations separately.
We show that the number of observations in the set plays a critical role in bounding the Bayes risk.
Under this framework, we propose new methods of set classification.
- Score: 16.11804985840274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Set classification aims to classify a set of observations as a whole, as
opposed to classifying individual observations separately. To formally
understand the unfamiliar concept of binary set classification, we first
investigate the optimal decision rule under the normal distribution, which
utilizes the empirical covariance of the set to be classified. We show that the
number of observations in the set plays a critical role in bounding the Bayes
risk. Under this framework, we further propose new methods of set
classification. For the case where only a few parameters of the model drive the
difference between two classes, we propose a computationally-efficient approach
to parameter estimation using linear programming, leading to the
Covariance-engaged LInear Programming Set (CLIPS) classifier. Its theoretical
properties are investigated for both independent case and various (short-range
and long-range dependent) time series structures among observations within each
set. The convergence rates of estimation errors and risk of the CLIPS
classifier are established to show that having multiple observations in a set
leads to faster convergence rates, compared to the standard classification
situation in which there is only one observation in the set. The applicable
domains in which the CLIPS performs better than competitors are highlighted in
a comprehensive simulation study. Finally, we illustrate the usefulness of the
proposed methods in classification of real image data in histopathology.
Related papers
- Time series clustering based on prediction accuracy of global
forecasting models [0.0]
A novel method to perform model-based clustering of time series is proposed in this paper.
Unlike most techniques proposed in the literature, the method considers the predictive accuracy as the main element for constructing the clustering partition.
An extensive simulation study shows that our method outperforms several alternative techniques concerning both clustering effectiveness and predictive accuracy.
arXiv Detail & Related papers (2023-04-30T13:12:19Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - An Upper Bound for the Distribution Overlap Index and Its Applications [18.481370450591317]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.
The proposed bound shows its value in one-class classification and domain shift analysis.
Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Conformal prediction set for time-series [16.38369532102931]
Uncertainty quantification is essential to studying complex machine learning methods.
We develop Ensemble Regularized Adaptive Prediction Set (ERAPS) to construct prediction sets for time-series.
We show valid marginal and conditional coverage by ERAPS, which also tends to yield smaller prediction sets than competing methods.
arXiv Detail & Related papers (2022-06-15T23:48:53Z) - Strong Consistency for a Class of Adaptive Clustering Procedures [0.0]
We show that all clustering procedures in this class are strongly consistent under IID samples.
In the adaptive setting, our work provides a strong consistency result that is the first of its kind.
arXiv Detail & Related papers (2022-02-27T18:56:41Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.