Consistent Structured Prediction with Max-Min Margin Markov Networks
- URL: http://arxiv.org/abs/2007.01012v2
- Date: Mon, 27 Jul 2020 18:24:07 GMT
- Title: Consistent Structured Prediction with Max-Min Margin Markov Networks
- Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi
- Abstract summary: Max-margin methods for binary classification have been extended to the structured prediction setting under the name of max-margin Markov networks ($M3N$)
We overcome such limitations by defining the learning problem in terms of a "max-min" margin formulation, naming the resulting method max-min margin Markov networks ($M4N$)
Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.
- Score: 84.60515484036239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Max-margin methods for binary classification such as the support vector
machine (SVM) have been extended to the structured prediction setting under the
name of max-margin Markov networks ($M^3N$), or more generally structural SVMs.
Unfortunately, these methods are statistically inconsistent when the
relationship between inputs and labels is far from deterministic. We overcome
such limitations by defining the learning problem in terms of a "max-min"
margin formulation, naming the resulting method max-min margin Markov networks
($M^4N$). We prove consistency and finite sample generalization bounds for
$M^4N$ and provide an explicit algorithm to compute the estimator. The
algorithm achieves a generalization error of $O(1/\sqrt{n})$ for a total cost
of $O(n)$ projection-oracle calls (which have at most the same cost as the
max-oracle from $M^3N$). Experiments on multi-class classification, ordinal
regression, sequence prediction and ranking demonstrate the effectiveness of
the proposed method.
Related papers
- Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning.
Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions.
We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - Universality of max-margin classifiers [10.797131009370219]
We study the role of featurization maps and the high-dimensional universality of the misclassification error for non-Gaussian features.
In particular, the overparametrization threshold and generalization error can be computed within a simpler model.
arXiv Detail & Related papers (2023-09-29T22:45:56Z) - Kernel Support Vector Machine Classifiers with the $\ell_0$-Norm Hinge
Loss [3.007949058551534]
Support Vector Machine (SVM) has been one of the most successful machine learning techniques for binary classification problems.
This paper is concentrated on vectors with hinge loss (referred as $ell$-KSVM), which is a composite function of hinge loss and $ell_$norm.
Experiments on the synthetic and real datasets are illuminated to show that $ell_$-KSVM can achieve comparable accuracy compared with the standard KSVM.
arXiv Detail & Related papers (2023-06-24T14:52:44Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.