Related papers: Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

URL: http://arxiv.org/abs/2306.02563v1
Date: Mon, 5 Jun 2023 03:33:26 GMT
Title: Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing
Authors: Tahseen Rabbani, Marco Bornstein, Furong Huang
Abstract summary: We develop one of the first private, personalized, and memory-efficient on-device LSH frameworks. Our framework enables privacy and personalization by allowing each device to generate hash tables, without the help of a central host. We prove several statistical and sensitivity properties of our hash functions, and experimentally demonstrate that our framework is competitive in training large-scale recommender networks.
Score: 11.885388917784804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Locality-sensitive hashing (LSH) based frameworks have been used efficiently to select weight vectors in a dense hidden layer with high cosine similarity to an input, enabling dynamic pruning. While this type of scheme has been shown to improve computational training efficiency, existing algorithms require repeated randomized projection of the full layer weight, which is impractical for computational- and memory-constrained devices. In a distributed setting, deferring LSH analysis to a centralized host is (i) slow if the device cluster is large and (ii) requires access to input data which is forbidden in a federated context. Using a new family of hash functions, we develop one of the first private, personalized, and memory-efficient on-device LSH frameworks. Our framework enables privacy and personalization by allowing each device to generate hash tables, without the help of a central host, using device-specific hashing hyper-parameters (e.g. number of hash tables or hash length). Hash tables are generated with a compressed set of the full weights, and can be serially generated and discarded if the process is memory-intensive. This allows devices to avoid maintaining (i) the fully-sized model and (ii) large amounts of hash tables in local memory for LSH analysis. We prove several statistical and sensitivity properties of our hash functions, and experimentally demonstrate that our framework is competitive in training large-scale recommender networks compared to other LSH frameworks which assume unrestricted on-device capacity.

Related papers

Multi-Probe Zero Collision Hash (MPZCH): Mitigating Embedding Collisions and Enhancing Model Freshness in Large-Scale Recommenders [47.3074050788206]
Multi-Probe Zero Collision Hash (MPZCH) is a novel indexing mechanism based on linear probing.<n>MPZCH achieves zero collisions for user embeddings and significantly improves item embedding freshness and quality.<n>The solution has been released within the open-source TorchRec library for the broader community.
arXiv Detail & Related papers (2026-02-19T03:42:57Z)
SOCKET: SOft Collison Kernel EsTimator for Sparse Attention [25.278711498381494]
Exploiting sparsity during long-context inference is central to scaling large language models.<n>Locality-Sensitive Hashing (LSH) is a sparsification primitive and replaces hard bucket matches with probabilistic, similarity-aware aggregation.<n>SOCKET is a SOft Collision EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregation.
arXiv Detail & Related papers (2026-02-06T00:41:44Z)
Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval [67.21678698740267]
We introduce Spotlight Attention, a novel method that employs non-linear hashing functions to optimize the embedding distribution of queries and keys.<n>We also develop a lightweight, stable training framework using a Bradley-Terry ranking-based loss.
arXiv Detail & Related papers (2025-08-27T10:11:27Z)
HexaMorphHash HMH- Homomorphic Hashing for Secure and Efficient Cryptographic Operations in Data Integrity Verification [0.0]
This paper introduces an innovative approach using a lattice based homomorphic hash validating HexaHashMorph.<n>Our contributions present a viable solution for frequent update dissemination in expansive distributed systems, safeguarding both data integrity and system performance.
arXiv Detail & Related papers (2025-07-01T18:53:23Z)
Gotta Hash 'Em All! Speeding Up Hash Functions for Zero-Knowledge Proof Applications [9.853088551679969]
HashEmAll is a novel collection of FPGA-based realizations for ZK-friendly hash functions.<n>Our evaluation shows that latency-optimized HashEmAll designs outperform CPU implementations by at least $10 times$.<n>This highlights the suitability of HashEmAll for real-world ZKP applications involving large-scale data authentication.
arXiv Detail & Related papers (2025-01-30T22:09:05Z)
A Flexible Plug-and-Play Module for Generating Variable-Length [61.095479786194836]
Nested Hash Layer (NHL) is a plug-and-play module designed for existing deep supervised hashing models. NHL simultaneously generates hash codes of varying lengths in a nested manner. NHL achieves superior retrieval performance across various deep hashing models.
arXiv Detail & Related papers (2024-12-12T04:13:09Z)
Sparse-Inductive Generative Adversarial Hashing for Nearest Neighbor Search [8.020530603813416]
We propose a novel unsupervised hashing method, termed Sparsity-Induced Generative Adversarial Hashing (SiGAH) SiGAH encodes large-scale high-scale high-dimensional features into binary codes, which solves the two problems through a generative adversarial training framework. Experimental results on four benchmarks, i.e. Tiny100K, GIST1M, Deep1M, and MNIST, have shown that the proposed SiGAH has superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-12T08:07:23Z)
Binary Representation via Jointly Personalized Sparse Hashing [22.296464665032588]
We propose an effective unsupervised method, namely Jointly Personalized Sparse Hashing (JPSH) for binary representation learning. Different personalized subspaces are constructed to reflect category-specific attributes for different clusters. To simultaneously preserve semantic and pairwise similarities in our JPSH, we incorporate the PSH and manifold-based hash learning into the seamless formulation.
arXiv Detail & Related papers (2022-08-31T14:18:37Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception [91.24236600199542]
ASH is a modern and high-performance framework for parallel spatial hashing on GPU. ASH achieves higher performance, supports richer functionality, and requires fewer lines of code. ASH and its example applications are open sourced in Open3D.
arXiv Detail & Related papers (2021-10-01T16:25:40Z)
Secure Neuroimaging Analysis using Federated Learning with Homomorphic Encryption [14.269757725951882]
Federated learning (FL) enables distributed computation of machine learning models over disparate, remote data sources. Recent membership attacks show that private or sensitive personal data can sometimes be leaked or inferred when model parameters or summary statistics are shared with a central site. We propose a framework for secure FL using fully-homomorphic encryption (FHE)
arXiv Detail & Related papers (2021-08-07T12:15:52Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings [65.36757931982469]
Image hash codes are produced by binarizing embeddings of convolutional neural networks (CNN) trained for either classification or retrieval. The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity. The resulting hash-consistent large margin (HCLM) proxies are shown to encourage saturation of hashing units, thus guaranteeing a small binarization error.
arXiv Detail & Related papers (2020-07-27T23:47:43Z)
Generative Semantic Hashing Enhanced via Boltzmann Machines [61.688380278649056]
Existing generative-hashing methods mostly assume a factorized form for the posterior distribution. We propose to employ the distribution of Boltzmann machine as the retrievalal posterior. We show that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
arXiv Detail & Related papers (2020-06-16T01:23:39Z)
Hashing-based Non-Maximum Suppression for Crowded Object Detection [63.761451382081844]
We propose an algorithm, named hashing-based non-maximum suppression (HNMS) to efficiently suppress the non-maximum boxes for object detection. For two-stage detectors, we replace NMS in region proposal network with HNMS, and observe significant speed-up with comparable accuracy. Experiments are conducted on CARPK, SKU-110K, CrowdHuman datasets to demonstrate the efficiency and effectiveness of HNMS.
arXiv Detail & Related papers (2020-05-22T23:45:59Z)
Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code. In this study, we propose a novel reinforcing short-length hashing (RSLH) In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information. Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.