Related papers: LSM Trees in Adversarial Environments

Related papers

Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity. LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning. LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities. LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands. We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems [26.528288876732617]
A set of new timing side channels can be exploited to infer confidential system prompts and those issued by other users.<n>These vulnerabilities echo security challenges observed in traditional computing systems.<n>We propose a token-by-token search algorithm to efficiently recover shared prompt prefixes in the caches.
arXiv Detail & Related papers (2024-09-30T06:55:00Z)
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding [61.45448947483328]
We introduce Lossless Acceleration via Speculative Decoding for LLM-based Recommender Systems (LASER) LASER features a Customized Retrieval Pool to enhance retrieval efficiency and Relaxed Verification to improve the acceptance rate of draft tokens. LASER achieves a 3-5x speedup on public datasets and saves about 67% of computational resources during the online A/B test.
arXiv Detail & Related papers (2024-08-11T02:31:13Z)
LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD [0.6774462529828165]
We introduce LearnedKV, a novel tiered key-value store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index. Our results show that LearnedKV outperforms state-of-the-art solutions by up to 1.32x in read requests and 1.31x in write performance.
arXiv Detail & Related papers (2024-06-27T05:08:09Z)
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z)
Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data. We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z)
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing large language models (LLMs) by integrating a structured and explicit read-and-write memory module.<n>Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular.
arXiv Detail & Related papers (2024-04-17T18:13:16Z)
Optimizing LLM Queries in Relational Workloads [58.254894049950366]
We show how to optimize Large Language Models (LLMs) inference for analytical workloads that invoke LLMs within relational queries. We implement these optimizations in Apache Spark, with vLLM as the model serving backend. We achieve up to 4.4x improvement in end-to-end latency on a benchmark of diverse LLM-based queries on real datasets.
arXiv Detail & Related papers (2024-03-09T07:01:44Z)
No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models [27.469321590884903]
We propose No-Skim to help the owners of skimming-based LLM to understand and measure the robustness of their acceleration scheme. Specifically, our framework searches minimal and unnoticeable perturbations at character-level and token-level to generate adversarial inputs that sufficiently increase the remaining token ratio. In the worst case, the perturbation found by No-Skim substantially increases the running cost of LLM by over 145% on average.
arXiv Detail & Related papers (2023-12-15T02:42:05Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads [16.898360021759487]
We present RusKey, a key-value store with the following new features. RusKey is a first attempt to orchestrate LSM-tree structures online. New LSM-tree design, named FLSM-tree, for efficient transition between different compaction policies.
arXiv Detail & Related papers (2023-08-14T09:00:58Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)
Salient Span Masking for Temporal Understanding [15.75700993677129]
Salient Span Masking (SSM) has shown itself to be an effective strategy to improve closed-book question answering performance. We investigate SSM from the perspective of temporal tasks, where learning a good representation of various temporal expressions is important.
arXiv Detail & Related papers (2023-03-22T18:49:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.