Permutation invariant functions: statistical tests, density estimation, and computationally efficient embedding
- URL: http://arxiv.org/abs/2403.01671v3
- Date: Thu, 9 May 2024 00:56:22 GMT
- Title: Permutation invariant functions: statistical tests, density estimation, and computationally efficient embedding
- Authors: Wee Chaimanowong, Ying Zhu,
- Abstract summary: Permutation invariance is among the most common symmetry that can be exploited to simplify complex problems in machine learning (ML)
In this paper, we take a step back and examine these questions in several fundamental problems.
Our methods for (i) and (iv) are based on a sorting trick and (ii) is based on an averaging trick.
- Score: 1.4316259003164373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Permutation invariance is among the most common symmetry that can be exploited to simplify complex problems in machine learning (ML). There has been a tremendous surge of research activities in building permutation invariant ML architectures. However, less attention is given to: (1) how to statistically test for permutation invariance of coordinates in a random vector where the dimension is allowed to grow with the sample size; (2) how to leverage permutation invariance in estimation problems and how does it help reduce dimensions. In this paper, we take a step back and examine these questions in several fundamental problems: (i) testing the assumption of permutation invariance of multivariate distributions; (ii) estimating permutation invariant densities; (iii) analyzing the metric entropy of permutation invariant function classes and compare them with their counterparts without imposing permutation invariance; (iv) deriving an embedding of permutation invariant reproducing kernel Hilbert spaces for efficient computation. In particular, our methods for (i) and (iv) are based on a sorting trick and (ii) is based on an averaging trick. These tricks substantially simplify the exploitation of permutation invariance.
Related papers
- Number of Clusters in a Dataset: A Regularized K-means Approach [0.6445605125467572]
Regularized k-means algorithm is used to find the correct number of distinct clusters in datasets.<n>In this paper, we derive rigorous bounds for $lambda$ assuming clusters are em ideal.<n> Experiments show that the k-means algorithm with additive regularizer often yields multiple solutions.
arXiv Detail & Related papers (2025-05-29T01:58:44Z) - A Canonicalization Perspective on Invariant and Equivariant Learning [54.44572887716977]
We introduce a canonicalization perspective that provides an essential and complete view of the design of frames.
We show that there exists an inherent connection between frames and canonical forms.
We design novel frames for eigenvectors that are strictly superior to existing methods.
arXiv Detail & Related papers (2024-05-28T17:22:15Z) - Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting [64.0722630873758]
We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation.
We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
arXiv Detail & Related papers (2023-10-09T18:38:56Z) - Geometry of Linear Neural Networks: Equivariance and Invariance under
Permutation Groups [0.0]
We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group.
We draw conclusions for the parameterization and the design of equivariant and invariant linear networks.
arXiv Detail & Related papers (2023-09-24T19:40:15Z) - Deep Neural Networks with Efficient Guaranteed Invariances [77.99182201815763]
We address the problem of improving the performance and in particular the sample complexity of deep neural networks.
Group-equivariant convolutions are a popular approach to obtain equivariant representations.
We propose a multi-stream architecture, where each stream is invariant to a different transformation.
arXiv Detail & Related papers (2023-03-02T20:44:45Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Generalization Bounds for Stochastic Gradient Descent via Localized
$\varepsilon$-Covers [16.618918548497223]
We propose a new covering technique localized for the trajectories of SGD.
This localization provides an algorithm-specific clustering measured by the bounds number.
We derive these results in various contexts and improve the known state-of-the-art label rates.
arXiv Detail & Related papers (2022-09-19T12:11:07Z) - On Fitness Landscape Analysis of Permutation Problems: From Distance
Metrics to Mutation Operator Selection [0.0]
We explore the theory and expand upon the practice of fitness landscape analysis for optimization problems over the space of permutations.
We begin with a survey of the available distance metrics for permutations, and then use principal component analysis to classify these metrics.
Our implementations of the permutation metrics, permutation mutation operators, and associated evolutionary algorithm, are available in a pair of open source Java libraries.
arXiv Detail & Related papers (2022-08-23T20:46:49Z) - Equivariant Disentangled Transformation for Domain Generalization under
Combination Shift [91.38796390449504]
Combinations of domains and labels are not observed during training but appear in the test environment.
We provide a unique formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement.
arXiv Detail & Related papers (2022-08-03T12:31:31Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Majorization-minimization for Sparse Nonnegative Matrix Factorization
with the $\eta$-divergence [2.3787352248749376]
It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation.
Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem.
We derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $ell_1$-regularization or the more "aggressive" log-regularization.
arXiv Detail & Related papers (2022-07-13T16:09:29Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - E-detectors: a nonparametric framework for sequential change detection [86.15115654324488]
We develop a fundamentally new and general framework for sequential change detection.
Our procedures come with clean, nonasymptotic bounds on the average run length.
We show how to design their mixtures in order to achieve both statistical and computational efficiency.
arXiv Detail & Related papers (2022-03-07T17:25:02Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Nonparametric approximation of conditional expectation operators [0.3655021726150368]
We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions.
We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
arXiv Detail & Related papers (2020-12-23T19:06:12Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - Regularizing Towards Permutation Invariance in Recurrent Models [26.36835670113303]
We show that RNNs can be regularized towards permutation invariance, and that this can result in compact models.
Existing solutions mostly suggest restricting the learning problem to hypothesis classes which are permutation invariant by design.
We show that our method outperforms other permutation invariant approaches on synthetic and real world datasets.
arXiv Detail & Related papers (2020-10-25T07:46:51Z) - Permutation Inference for Canonical Correlation Analysis [0.7646713951724012]
We show that a simple permutation test for canonical correlations leads to inflated error rates.
In the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first.
Here we show that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test.
arXiv Detail & Related papers (2020-02-24T02:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.