Trading off Consistency and Dimensionality of Convex Surrogates for the
Mode
- URL: http://arxiv.org/abs/2402.10818v1
- Date: Fri, 16 Feb 2024 16:42:09 GMT
- Title: Trading off Consistency and Dimensionality of Convex Surrogates for the
Mode
- Authors: Enrique Nueve, Bo Waggoner, Dhamma Kimpara, Jessie Finocchiaro
- Abstract summary: In multiclass classification over $n$ outcomes, the outcomes must be embedded into the reals with dimension at least $n-1$.
We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex.
We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than $n-1$ dimensions, there exist distributions for which a phenomenon called hallucination occurs.
- Score: 6.096888891865663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multiclass classification over $n$ outcomes, the outcomes must be embedded
into the reals with dimension at least $n-1$ in order to design a consistent
surrogate loss that leads to the "correct" classification, regardless of the
data distribution. For large $n$, such as in information retrieval and
structured prediction tasks, optimizing a surrogate in $n-1$ dimensions is
often intractable. We investigate ways to trade off surrogate loss dimension,
the number of problem instances, and restricting the region of consistency in
the simplex for multiclass classification. Following past work, we examine an
intuitive embedding procedure that maps outcomes into the vertices of convex
polytopes in a low-dimensional surrogate space. We show that full-dimensional
subsets of the simplex exist around each point mass distribution for which
consistency holds, but also, with less than $n-1$ dimensions, there exist
distributions for which a phenomenon called hallucination occurs, which is when
the optimal report under the surrogate loss is an outcome with zero
probability. Looking towards application, we derive a result to check if
consistency holds under a given polytope embedding and low-noise assumption,
providing insight into when to use a particular embedding. We provide examples
of embedding $n = 2^{d}$ outcomes into the $d$-dimensional unit cube and $n =
d!$ outcomes into the $d$-dimensional permutahedron under low-noise
assumptions. Finally, we demonstrate that with multiple problem instances, we
can learn the mode with $\frac{n}{2}$ dimensions over the whole simplex.
Related papers
- On Reductions and Representations of Learning Problems in Euclidean Spaces [15.07787640047213]
Many practical prediction algorithms represent inputs in Euclidean space and replace the discrete 0/1 classification loss with a real-trivial surrogate loss.
We develop a generalization of the Borsuk-Ulam Theorem that combines the classical topological approach with convexity considerations.
We also present natural classification tasks that can be represented in much smaller dimensions by leveraging randomness.
arXiv Detail & Related papers (2024-11-16T12:09:37Z) - Better Locally Private Sparse Estimation Given Multiple Samples Per User [2.9562742331218725]
We investigate user-level locally differentially private sparse linear regression.
We show that with $n$ users each contributing $m$ samples, the linear dependency of dimension $d$ can be eliminated.
We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space.
arXiv Detail & Related papers (2024-08-08T08:47:20Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - How many dimensions are required to find an adversarial example? [0.0]
We investigate how adversarial vulnerability depends on $dim(V)$.
In particular, we show that the adversarial success of standard PGD attacks with $ellp$ norm constraints behaves like a monotonically increasing function of $epsilon.
arXiv Detail & Related papers (2023-03-24T17:36:15Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Randomized Dimensionality Reduction for Facility Location and
Single-Linkage Clustering [13.208510864854894]
Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems.
We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem.
arXiv Detail & Related papers (2021-07-05T05:55:26Z) - Adversarial Classification: Necessary conditions and geometric flows [0.7614628596146599]
We study a version of adversarial classification where an adversary is empowered to corrupt data inputs up to some distance $varepsilon$.
We derive a geometric evolution equation which can be used to track the change in classification boundaries as $varepsilon$ varies.
arXiv Detail & Related papers (2020-11-21T14:14:12Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.