Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data
- URL: http://arxiv.org/abs/2504.02268v1
- Date: Thu, 03 Apr 2025 04:27:02 GMT
- Title: Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data
- Authors: Waris Gill, Justin Cechmanek, Tyler Hutcherson, Srijith Rajamohan, Jen Agarwal, Muhammad Ali Gulzar, Manvinder Singh, Benoit Dion,
- Abstract summary: This report investigates enhancing semantic caching effectiveness by employing specialized, fine-tuned embedding models.<n>We propose leveraging smaller, domain-specific embedding models, fine-tuned with targeted real-world and synthetically generated datasets.
- Score: 3.877325424485755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This report investigates enhancing semantic caching effectiveness by employing specialized, fine-tuned embedding models. Semantic caching relies on embedding similarity rather than exact key matching, presenting unique challenges in balancing precision, query latency, and computational efficiency. We propose leveraging smaller, domain-specific embedding models, fine-tuned with targeted real-world and synthetically generated datasets. Our empirical evaluations demonstrate that compact embedding models fine-tuned for just one epoch on specialized datasets significantly surpass both state-of-the-art open-source and proprietary alternatives in precision and recall. Moreover, we introduce a novel synthetic data generation pipeline for the semantic cache that mitigates the challenge of limited domain-specific annotated data, further boosting embedding performance. Our approach effectively balances computational overhead and accuracy, establishing a viable and efficient strategy for practical semantic caching implementations.
Related papers
- Online federated learning framework for classification [7.613977984287604]
We develop a novel online federated learning framework for classification.<n>We handle streaming data from multiple clients while ensuring data privacy and computational efficiency.<n>Our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.
arXiv Detail & Related papers (2025-03-19T13:50:19Z) - Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation [37.77454972709646]
We introduce Sharpness-Aware Trajectory Matching (SATM), which enhances the generalization capability of learned synthetic datasets.<n>Our approach is mathematically well-supported and straightforward to implement along with controllable computational overhead.
arXiv Detail & Related papers (2025-02-03T22:30:06Z) - Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance [15.877771709013743]
In this work, we propose a novel finetuning framework to ternary-weight embedding models.
To apply ternarization to pre-trained embedding models, we introduce self-taught knowledge distillation to finalize the ternary-weights of the linear layers.
With extensive experiments on public text and vision datasets, we demonstrated that without sacrificing effectiveness, the ternarized model consumes low memory usage.
arXiv Detail & Related papers (2024-11-23T03:44:56Z) - Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation [44.03611131165989]
We propose a novel generative parameterization method dubbed Hierarchical generative Distillation (H-PD)<n>The proposed H-PD achieves a significant performance improvement under various settings with equivalent time consumption.<n>It even surpasses current generative distillation using diffusion models under extreme compression ratios IPC=1 and IPC=10.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models.
We also propose a simple pre-training technique that leads to fully or partially shared models.
Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Effective Few-Shot Named Entity Linking by Meta-Learning [34.70028855572534]
We propose a novel weak supervision strategy to generate non-trivial synthetic entity-mention pairs.
We also design a meta-learning mechanism to assign different weights to each synthetic entity-mention pair automatically.
Experiments on real-world datasets show that the proposed method can extensively improve the state-of-the-art few-shot entity linking model.
arXiv Detail & Related papers (2022-07-12T03:23:02Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Decomposed Adversarial Learned Inference [118.27187231452852]
We propose a novel approach, Decomposed Adversarial Learned Inference (DALI)
DALI explicitly matches prior and conditional distributions in both data and code spaces.
We validate the effectiveness of DALI on the MNIST, CIFAR-10, and CelebA datasets.
arXiv Detail & Related papers (2020-04-21T20:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.