Calibrated Similarity for Reliable Geometric Analysis of Embedding Spaces
- URL: http://arxiv.org/abs/2601.16907v1
- Date: Fri, 23 Jan 2026 17:14:44 GMT
- Title: Calibrated Similarity for Reliable Geometric Analysis of Embedding Spaces
- Authors: Nicolas Tacheny,
- Abstract summary: We construct a isotonic transformation that achieves near-perfect calibration while preserving rank correlation and local stability.<n>Our contribution is not to replace cosine similarity, but to restore interpretability of its absolute values through monotone calibration.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While raw cosine similarity in pretrained embedding spaces exhibits strong rank correlation with human judgments, anisotropy induces systematic miscalibration of absolute values: scores concentrate in a narrow high-similarity band regardless of actual semantic relatedness, limiting interpretability as a quantitative measure. Prior work addresses this by modifying the embedding space (whitening, contrastive fine tuning), but such transformations alter geometric structure and require recomputing all embeddings. Using isotonic regression trained on human similarity judgments, we construct a monotonic transformation that achieves near-perfect calibration while preserving rank correlation and local stability(98% across seven perturbation types). Our contribution is not to replace cosine similarity, but to restore interpretability of its absolute values through monotone calibration, without altering its ranking properties. We characterize isotonic calibration as an order-preserving reparameterization and prove that all order-based constructions (angular ordering, nearest neighbors, threshold graphs and quantile-based decisions) are invariant under this transformation.
Related papers
- On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes [15.63629978994481]
We present the first comprehensive theoretical analysis of this gradient cascade setting.<n>We identify conditions under which perturbations do not deteriorate the gradient convergence order.
arXiv Detail & Related papers (2026-02-24T07:47:15Z) - Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - Beyond Cosine Similarity [5.076419064097734]
Cosine similarity, the standard metric for measuring semantic similarity in vector spaces, is mathematically grounded in the Cauchy-Schwarz inequality.<n>We advance this theoretical underpinning by deriving a tighter upper bound for the dot product than the classical Cauchy-Schwarz bound.<n>Our work establishes recos as a mathematically principled and empirically superior alternative, offering enhanced accuracy for semantic analysis in complex embedding spaces.
arXiv Detail & Related papers (2026-02-05T03:46:21Z) - Scale-Consistent State-Space Dynamics via Fractal of Stationary Transformations [9.983526161001997]
Recent deep learning models increasingly rely on depth without structural guarantees on the validity of intermediate representations.<n>We address this limitation by formulating a structural requirement for state-space model's scale-consistent latent dynamics.<n>We empirically verify the predicted scale-consistent behavior, showing that adaptive efficiency emerges from the aligned latent geometry.
arXiv Detail & Related papers (2026-01-27T12:44:20Z) - Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds [57.179679246370114]
We construct metrics that are geodesically complete while ensuring that every stationary point under the new metric remains stationary under the original one.<n>An $$-stationary point under the constructed metric $g'$ also corresponds to an $$-stationary point under the original metric $g'$.<n>Experiments on a practical mesh optimization task demonstrate that our framework maintains stable convergence even in the absence of geodesic completeness.
arXiv Detail & Related papers (2026-01-12T22:08:03Z) - Solving a Nonlinear Eigenvalue Equation in Quantum Information Theory: A Hybrid Approach to Entanglement Quantification [0.0]
We present a hybrid analytical and numerical framework for evaluating the geometric measure of entanglement.<n>We make the coupled nonlinear eigenstructure explicit by proving the equal multiplier stationarity.<n>The resulting hybrid solver reproduces the exact optimum for standard three qubit benchmarks.
arXiv Detail & Related papers (2025-11-14T13:36:47Z) - Benign Overfitting and the Geometry of the Ridge Regression Solution in Binary Classification [75.01389991485098]
We show that ridge regression has qualitatively different behavior depending on the scale of the cluster mean vector.<n>In regimes where the scale is very large, the conditions that allow for benign overfitting turn out to be the same as those for the regression task.
arXiv Detail & Related papers (2025-03-11T01:45:42Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.<n>We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning [1.5229257192293195]
We investigate the uniform convergence of subdifferential mappings from empirical risk to population risk in nonsmooth, non-valued to deterministic optimization.<n>These guarantees offer new insight into the geometry of problems arising in robust statistics and related applications.
arXiv Detail & Related papers (2024-05-16T17:49:46Z) - Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression [1.342834401139078]
We introduce SpectrumAware Debiasing, a novel method for high-dimensional regression.
Our approach applies to problems with structured, heavy tails, and low-rank structures.
We demonstrate our method through simulated and real data experiments.
arXiv Detail & Related papers (2023-09-14T15:58:30Z) - Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.