Matching Map Recovery with an Unknown Number of Outliers
- URL: http://arxiv.org/abs/2210.13354v1
- Date: Mon, 24 Oct 2022 15:59:10 GMT
- Title: Matching Map Recovery with an Unknown Number of Outliers
- Authors: Arshak Minasyan, Tigran Galstyan, Sona Hunanyan, Arnak Dalalyan
- Abstract summary: We consider the problem of finding the matching map between two sets of $d$-dimensional noisy feature-vectors.
We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than $5(dlog(4nm/alpha))1/4$, the true matching map can be recovered with probability $1-alpha$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of finding the matching map between two sets of
$d$-dimensional noisy feature-vectors. The distinctive feature of our setting
is that we do not assume that all the vectors of the first set have their
corresponding vector in the second set. If $n$ and $m$ are the sizes of these
two sets, we assume that the matching map that should be recovered is defined
on a subset of unknown cardinality $k^*\le \min(n,m)$. We show that, in the
high-dimensional setting, if the signal-to-noise ratio is larger than
$5(d\log(4nm/\alpha))^{1/4}$, then the true matching map can be recovered with
probability $1-\alpha$. Interestingly, this threshold does not depend on $k^*$
and is the same as the one obtained in prior work in the case of $k =
\min(n,m)$. The procedure for which the aforementioned property is proved is
obtained by a data-driven selection among candidate mappings
$\{\hat\pi_k:k\in[\min(n,m)]\}$. Each $\hat\pi_k$ minimizes the sum of squares
of distances between two sets of size $k$. The resulting optimization problem
can be formulated as a minimum-cost flow problem, and thus solved efficiently.
Finally, we report the results of numerical experiments on both synthetic and
real-world data that illustrate our theoretical results and provide further
insight into the properties of the algorithms studied in this work.
Related papers
- The Computational Complexity of Finding Stationary Points in Non-Convex
Optimization [60.53870185393238]
Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-order but smooth objective functions is a computational problem.
We show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.
arXiv Detail & Related papers (2023-10-13T14:52:46Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation.
In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$.
For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Random Graph Matching in Geometric Models: the Case of Complete Graphs [21.689343447798677]
This paper studies the problem of matching two complete graphs with edge weights correlated through latent geometries.
We derive an approximate maximum likelihood estimator, which provably achieves, with high probability, perfect recovery of $pi*$.
As a side discovery, we show that the celebrated spectral algorithm of [Ume88] emerges as a further approximation to the maximum likelihood in the geometric model.
arXiv Detail & Related papers (2022-02-22T04:14:45Z) - Computational Complexity of Normalizing Constants for the Product of
Determinantal Point Processes [12.640283469603357]
We study the computational complexity of computing the normalizing constant.
We show that $sum_Sdet(bf A_S,S)p$ exactly for every (fixed) positive even integer $p$ is UP-hard and Mod$_3$P-hard.
arXiv Detail & Related papers (2021-11-28T14:08:25Z) - An Algorithm for Learning Smaller Representations of Models With Scarce
Data [0.0]
We present a greedy algorithm for solving binary classification problems in situations where the dataset is too small or not fully representative.
It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data.
arXiv Detail & Related papers (2020-10-15T19:17:51Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.