Softmax-based Classification is k-means Clustering: Formal Proof,
Consequences for Adversarial Attacks, and Improvement through Centroid Based
Tailoring
- URL: http://arxiv.org/abs/2001.01987v1
- Date: Tue, 7 Jan 2020 11:47:45 GMT
- Title: Softmax-based Classification is k-means Clustering: Formal Proof,
Consequences for Adversarial Attacks, and Improvement through Centroid Based
Tailoring
- Authors: Sibylle Hess, Wouter Duivesteijn, Decebal Mocanu
- Abstract summary: We prove the connection between k-means clustering and the predictions of neural networks based on the softmax activation layer.
We propose Centroid Based Tailoring as an alternative to the softmax function in the last layer of a neural network.
- Score: 3.0724051098062097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We formally prove the connection between k-means clustering and the
predictions of neural networks based on the softmax activation layer. In
existing work, this connection has been analyzed empirically, but it has never
before been mathematically derived. The softmax function partitions the
transformed input space into cones, each of which encompasses a class. This is
equivalent to putting a number of centroids in this transformed space at equal
distance from the origin, and k-means clustering the data points by proximity
to these centroids. Softmax only cares in which cone a data point falls, and
not how far from the centroid it is within that cone. We formally prove that
networks with a small Lipschitz modulus (which corresponds to a low
susceptibility to adversarial attacks) map data points closer to the cluster
centroids, which results in a mapping to a k-means-friendly space. To leverage
this knowledge, we propose Centroid Based Tailoring as an alternative to the
softmax function in the last layer of a neural network. The resulting Gauss
network has similar predictive accuracy as traditional networks, but is less
susceptible to one-pixel attacks; while the main contribution of this paper is
theoretical in nature, the Gauss network contributes empirical auxiliary
benefits.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder [1.2487990897680423]
We introduce a novel nonlinear model, Sparse Adaptive Bottleneckid-Encoder (SABCE), for determining the features that discriminate between two or more classes.
The algorithm is applied to various real-world data sets, including high-dimensional biological, image, speech, and accelerometer sensor data.
arXiv Detail & Related papers (2023-06-07T21:37:21Z) - Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes.
We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure.
Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Clustering by the Probability Distributions from Extreme Value Theory [32.496691290725764]
This paper generalizes k-means to model the distribution of clusters.
We use GPD to establish a probability model for each cluster.
We also introduce a naive baseline, dubbed as Generalized Extreme Value (GEV) k-means.
Notably, GEV k-means can also estimate cluster structure and thus perform reasonably well over classical k-means.
arXiv Detail & Related papers (2022-02-20T10:52:43Z) - A Modular Framework for Centrality and Clustering in Complex Networks [0.6423239719448168]
In this paper, we study two important such network analysis techniques, namely, centrality and clustering.
An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality.
Our clustering naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees.
arXiv Detail & Related papers (2021-11-23T03:01:29Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Hyperdimensional Computing for Efficient Distributed Classification with
Randomized Neural Networks [5.942847925681103]
We study distributed classification, which can be employed in situations were data cannot be stored at a central location nor shared.
We propose a more efficient solution for distributed classification by making use of a lossy compression approach applied when sharing the local classifiers with other agents.
arXiv Detail & Related papers (2021-06-02T01:33:56Z) - Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science.
In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.