Related papers: Softmax-based Classification is k-means Clustering: Formal Proof, Consequences for Adversarial Attacks, and Improvement through Centroid Based Tailoring

Softmax-based Classification is k-means Clustering: Formal Proof, Consequences for Adversarial Attacks, and Improvement through Centroid Based Tailoring

URL: http://arxiv.org/abs/2001.01987v1
Date: Tue, 7 Jan 2020 11:47:45 GMT
Title: Softmax-based Classification is k-means Clustering: Formal Proof, Consequences for Adversarial Attacks, and Improvement through Centroid Based Tailoring
Authors: Sibylle Hess, Wouter Duivesteijn, Decebal Mocanu
Abstract summary: We prove the connection between k-means clustering and the predictions of neural networks based on the softmax activation layer. We propose Centroid Based Tailoring as an alternative to the softmax function in the last layer of a neural network.
Score: 3.0724051098062097
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We formally prove the connection between k-means clustering and the predictions of neural networks based on the softmax activation layer. In existing work, this connection has been analyzed empirically, but it has never before been mathematically derived. The softmax function partitions the transformed input space into cones, each of which encompasses a class. This is equivalent to putting a number of centroids in this transformed space at equal distance from the origin, and k-means clustering the data points by proximity to these centroids. Softmax only cares in which cone a data point falls, and not how far from the centroid it is within that cone. We formally prove that networks with a small Lipschitz modulus (which corresponds to a low susceptibility to adversarial attacks) map data points closer to the cluster centroids, which results in a mapping to a k-means-friendly space. To leverage this knowledge, we propose Centroid Based Tailoring as an alternative to the softmax function in the last layer of a neural network. The resulting Gauss network has similar predictive accuracy as traditional networks, but is less susceptible to one-pixel attacks; while the main contribution of this paper is theoretical in nature, the Gauss network contributes empirical auxiliary benefits.

Related papers

Explorations of the Softmax Space: Knowing When the Neural Network Doesn't Know [2.6626950367610394]
This paper proposes a new approach for measuring confidence in the predictions of any neural network. We identify that a high-accuracy trained network may have certain outputs for which there should be low confidence. We show that a cluster with centroid calculated simply as the mean softmax output for all correct predictions can serve as a suitable proxy in the evaluation of confidence.
arXiv Detail & Related papers (2025-02-01T15:25:03Z)
An Approach Towards Learning K-means-friendly Deep Latent Representation [0.6798775532273751]
Clustering is a long-standing problem area in data mining. With the advent of deep neural networks, a common approach to this problem is to map the data to some latent space of comparatively lower dimensions. A well-known centroid-based clustering algorithm is K-means.
arXiv Detail & Related papers (2024-11-29T06:28:38Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z)
Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder [1.2487990897680423]
We introduce a novel nonlinear model, Sparse Adaptive Bottleneckid-Encoder (SABCE), for determining the features that discriminate between two or more classes. The algorithm is applied to various real-world data sets, including high-dimensional biological, image, speech, and accelerometer sensor data.
arXiv Detail & Related papers (2023-06-07T21:37:21Z)
Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes. We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure. Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Clustering by the Probability Distributions from Extreme Value Theory [32.496691290725764]
This paper generalizes k-means to model the distribution of clusters. We use GPD to establish a probability model for each cluster. We also introduce a naive baseline, dubbed as Generalized Extreme Value (GEV) k-means. Notably, GEV k-means can also estimate cluster structure and thus perform reasonably well over classical k-means.
arXiv Detail & Related papers (2022-02-20T10:52:43Z)
A Modular Framework for Centrality and Clustering in Complex Networks [0.6423239719448168]
In this paper, we study two important such network analysis techniques, namely, centrality and clustering. An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality. Our clustering naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees.
arXiv Detail & Related papers (2021-11-23T03:01:29Z)
Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms. We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z)
Hyperdimensional Computing for Efficient Distributed Classification with Randomized Neural Networks [5.942847925681103]
We study distributed classification, which can be employed in situations were data cannot be stored at a central location nor shared. We propose a more efficient solution for distributed classification by making use of a lossy compression approach applied when sharing the local classifiers with other agents.
arXiv Detail & Related papers (2021-06-02T01:33:56Z)
Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science. In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.