Related papers: REWA: A General Theory of Witness-Based Similarity

REWA: A General Theory of Witness-Based Similarity

URL: http://arxiv.org/abs/2511.19998v2
Date: Fri, 28 Nov 2025 07:09:03 GMT
Title: REWA: A General Theory of Witness-Based Similarity
Authors: Nikit Phadke,
Abstract summary: We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods.<n>This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods under a single theoretical umbrella. By formulating similarity as functional witness projection over monoids, we prove that \[ O\!\left(\frac{1}{Δ^{2}}\log N\right) \] encoding complexity with ranking preservation holds for arbitrary algebraic structures. This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism. We provide complete proofs with explicit constants under 4-wise independent hashing, handle heavy-tailed witnesses via normalization and clipping, and prove \[ O(\log N) \] complexity for all major similarity methods from 1970-2024. We give explicit constructions for Boolean, Natural, Real, Tropical, and Product monoids, prove tight concentration bounds, and demonstrate compositional properties enabling multi-primitive similarity systems.

Related papers

Bridging Functional and Representational Similarity via Usable Information [3.9189279162842854]
We present a unified framework for quantifying the similarity between representations through the lens of textitusable information<n>First, addressing functional similarity, we establish a formal link between stitching performance and conditional mutual information.<n>Second, concerning representational similarity, we prove that reconstruction-based metrics and standard tools act as estimators of usable information under specific constraints.
arXiv Detail & Related papers (2026-01-29T11:30:55Z)
Generalization and Completeness of Stochastic Local Search Algorithms [0.0]
We generalize Local Search (SLS) algorithms into a unique formal model.<n>This model has two key components: a common structure designed to be as large as possible and a parametric structure intended to be as small as possible.<n>We use our model to prove the Turing-completeness of SLS algorithms in general.
arXiv Detail & Related papers (2026-01-20T18:17:45Z)
Similarity Field Theory: A Mathematical Framework for Intelligence [0.0]
This paper introduces Similarity Field Theory, a mathematical framework that formalizes the principles governing similarity values among entities and their evolution.<n>At a high level, this framework reframes intelligence and interpretability as geometric problems on similarity fields.<n>We prove two theorems: (i) asymmetry blocks mutual inclusion; and (ii) stability requires either an anchor coordinate or eventual confinement within a level set.
arXiv Detail & Related papers (2025-09-21T22:34:00Z)
Loss-Complexity Landscape and Model Structure Functions [53.92822954974537]
We develop a framework for dualizing the Kolmogorov structure function $h_x(alpha)$.<n>We establish a mathematical analogy between information-theoretic constructs and statistical mechanics.<n>We explicitly prove the Legendre-Fenchel duality between the structure function and free energy.
arXiv Detail & Related papers (2025-07-17T21:31:45Z)
Generalized and Unified Equivalences between Hardness and Pseudoentropy [6.683376336762599]
We prove a unified pseudoentropy characterization that generalizes and strengthens previous results for both uniform and non-uniform models of computation.<n>Our characterization holds for a general family of entropy notions that encompasses the common notions of Shannon entropy and min entropy as special cases.
arXiv Detail & Related papers (2025-07-08T13:27:03Z)
Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.<n>We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.<n> Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z)
Interaction Asymmetry: A General Principle for Learning Composable Abstractions [27.749478197803256]
We show that interaction asymmetry enables both disentanglement and compositional generalization. We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder.
arXiv Detail & Related papers (2024-11-12T13:33:26Z)
Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph.<n>We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Probing multipartite entanglement through persistent homology [6.107978190324034]
We propose a study of multipartite entanglement through persistent homology. persistent homology is a tool used in topological data analysis. We show that persistence barcodes provide more fine-grained information than its topological summary.
arXiv Detail & Related papers (2023-07-14T17:24:33Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Unsupervised Learning of Equivariant Structure from Sequences [30.974508897223124]
We present an unsupervised framework to learn the symmetry from the time sequence of length at least three. We will demonstrate that, with our framework, the hidden disentangled structure of the dataset naturally emerges as a by-product.
arXiv Detail & Related papers (2022-10-12T07:29:18Z)
Generating Compressed Combinatory Proof Structures -- An Approach to Automated First-Order Theorem Proving [0.0]
We introduce here this "combinator term as proof structure" approach to automated first-order proving. The approach builds on a term view of proof structures rooted in condensed detachment and the connection method.
arXiv Detail & Related papers (2022-09-26T11:23:17Z)
Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks. Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z)
Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization. We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM) We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z)
Instance Similarity Learning for Unsupervised Feature Representation [83.31011038813459]
We propose an instance similarity learning (ISL) method for unsupervised feature representation. We employ the Generative Adversarial Networks (GAN) to mine the underlying feature manifold. Experiments on image classification demonstrate the superiority of our method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-05T16:42:06Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.