Related papers: Managing Large Enclaves in a Data Center

Related papers

Put Teacher in Student's Shoes: Cross-Distillation for Ultra-compact Model Compression Framework [48.66685912952879]
We introduce Edge ultra-lIte BERT framework (EI-BERT) with a novel cross-distillation method.<n>We achieve a remarkably compact BERT-based model of only 1.91 MB - the smallest to date for Natural Language Understanding (NLU) tasks.
arXiv Detail & Related papers (2025-07-07T03:38:09Z)
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z)
MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models [72.61076288351201]
We propose Memory-efficient Offloaded Mini-sequence Inference (MOM) MOM partitions critical layers into smaller "mini-sequences" and integrates seamlessly with KV cache offloading. On Meta-Llama-3.2-8B, MOM extends the maximum context length from 155k to 455k tokens on a single A100 80GB GPU.
arXiv Detail & Related papers (2025-04-16T23:15:09Z)
Mobility-aware Seamless Service Migration and Resource Allocation in Multi-edge IoV Systems [22.33677210691788]
Mobile Edge Computing (MEC) offers low-latency and high-bandwidth support for Internet-of-Vehicles (IoV) applications.<n>It is hard to maintain uninterrupted and high-quality services without proper service migration among MEC servers.<n>Existing solutions commonly rely on prior knowledge and rarely consider efficient resource allocation during the service migration process.
arXiv Detail & Related papers (2025-03-11T07:03:25Z)
Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests.<n>We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments.<n>We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
A performance analysis of VM-based Trusted Execution Environments for Confidential Federated Learning [0.0]
Federated Learning (FL) is a distributed machine learning approach that has emerged as an effective way to address recent privacy concerns.<n>FL introduces the need for additional security measures as FL alone is still subject to vulnerabilities such as model and data poisoning and inference attacks.<n> Confidential Computing (CC) is a paradigm that, by leveraging hardware-based trusted execution environments (TEEs), protects the confidentiality and integrity of ML models and data.
arXiv Detail & Related papers (2025-01-20T15:58:48Z)
Secure Resource Allocation via Constrained Deep Reinforcement Learning [49.15061461220109]
We present SARMTO, a framework that balances resource allocation, task offloading, security, and performance.<n>SARMTO consistently outperforms five baseline approaches, achieving up to a 40% reduction in system costs.<n>These enhancements highlight SARMTO's potential to revolutionize resource management in intricate distributed computing environments.
arXiv Detail & Related papers (2025-01-20T15:52:43Z)
Retrofitting XoM for Stripped Binaries without Embedded Data Relocation [10.947944442975697]
We present PXoM, a practical technique to seamlessly retrofit XoM into stripped binaries on the x86-64 platform. We leverage Intel's hardware feature, Memory Protection Keys, to offer an efficient fine-grained permission control. PXoM leaves adversaries with little wiggle room to harvest all of the required gadgets, suggesting PXoM is practical for real-world deployment.
arXiv Detail & Related papers (2024-12-03T03:08:27Z)
Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud [0.0]
Review report discusses the cold start latency in serverless inference and existing solutions. System designed to address the cold start problem in serverless inference for large language models.
arXiv Detail & Related papers (2024-11-23T22:19:37Z)
Devlore: Extending Arm CCA to Integrated Devices A Journey Beyond Memory to Interrupt Isolation [10.221747752230131]
Arm Confidential Computing Architecture executes sensitive computation in an abstraction called realm. CCA does not allow integrated devices on the platform to access realm. We present Devlore which allows realm to directly access integrated peripherals.
arXiv Detail & Related papers (2024-08-11T17:33:48Z)
vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z)
SecScale: A Scalable and Secure Trusted Execution Environment for Servers [0.36868085124383626]
Intel plans to deprecate its most trustworthy enclave, SGX, on its 11th and 12th generation processors. We propose SecScale that uses some new ideas centered around speculative execution. We show that we are 10% faster than the nearest competing alternative.
arXiv Detail & Related papers (2024-07-18T15:14:36Z)
TME-Box: Scalable In-Process Isolation through Intel TME-MK Memory Encryption [11.543384661361232]
Cloud computing relies on in-process isolation to optimize performance by running workloads within a single process. Existing in-process isolation mechanisms are not suitable for modern cloud requirements. This paper presents TME-Box, a novel isolation technique that enables fine-grained and scalable sandboxing on commodity x86 machines.
arXiv Detail & Related papers (2024-07-15T14:09:00Z)
TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing [13.983627699836376]
Existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. We propose a unified tensor-granularity heterogeneous TEE for efficient secure collaborative computing. The results show that the TEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work.
arXiv Detail & Related papers (2024-07-12T00:35:18Z)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference [78.65321721142624]
We focus on a memory bottleneck imposed by the key-value ( KV) cache. Existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs. We propose LESS, a simple integration of a constant sized cache with eviction-based cache methods.
arXiv Detail & Related papers (2024-02-14T18:54:56Z)
HasTEE+ : Confidential Cloud Computing and Analytics with Haskell [50.994023665559496]
Confidential computing enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs) TEEs offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks. We address the above with HasTEE+, a domain-specific language (cla) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety.
arXiv Detail & Related papers (2024-01-17T00:56:23Z)
SpotServe: Serving Generative Large Language Models on Preemptible Instances [64.18638174004151]
SpotServe is the first distributed large language models serving system on preemptible instances. We show that SpotServe can reduce the P99 tail latency by 2.4 - 9.1x compared with the best existing LLM serving systems. We also show that SpotServe can leverage the price advantage of preemptive instances, saving 54% monetary cost compared with only using on-demand instances.
arXiv Detail & Related papers (2023-11-27T06:31:17Z)
MemGPT: Towards LLMs as Operating Systems [50.02623936965231]
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows. We propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems. We release MemGPT code and data for our experiments at https://memgpt.ai.
arXiv Detail & Related papers (2023-10-12T17:51:32Z)
Full Parameter Fine-tuning for Large Language Models with Limited Resources [55.794732214059806]
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. We propose a new computation, LOw-Memory Optimization (LOMO), which fuses the gradient and the parameter update in one step to reduce memory usage.
arXiv Detail & Related papers (2023-06-16T11:37:15Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Reinforcement Learning Framework for Server Placement and Workload Allocation in Multi-Access Edge Computing [9.598394554018164]
This paper addresses the problem of minimizing both, the network delay, and the number of edge servers to provide a MEC design with minimum cost. We propose a novel RL framework with an efficient representation and modeling of the state space, action space and the penalty function in the design of the underlying Markov Decision Process (MDP) for solving our problem.
arXiv Detail & Related papers (2022-02-21T03:04:50Z)
Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data. This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z)
Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning [19.260402028696916]
Continual Learning (CL) aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. Previous studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. We propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage.
arXiv Detail & Related papers (2021-10-14T11:27:45Z)
Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method [18.891775769665102]
Multi-access Edge Computing (MEC) is an emerging computing paradigm that extends cloud computing to the network edge. Service migration needs to decide where to migrate user services for maintaining high Quality-of-Service (QoS) We propose a new learning-driven method, namely Deep Recurrent ActorCritic based service Migration (DRACM), which is usercentric and can make effective online migration decisions.
arXiv Detail & Related papers (2020-12-16T00:16:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.