Related papers: Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

URL: http://arxiv.org/abs/2508.09194v1
Date: Fri, 08 Aug 2025 09:53:53 GMT
Title: Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Authors: Yipeng Du, Zihao Wang, Ahmad Farhan, Claudio Angione, Harry Yang, Fielding Johnston, James P. Buban, Patrick Colangelo, Yue Zhao, Yuzhe Yang,
Abstract summary: We introduce a meta-learning-based framework for inference acceleration in decentralized AI systems.<n>Unlike traditional methods, our framework systematically identifies the best acceleration strategies based on the specific characteristics of each task.<n>Our results highlight the potential of inference acceleration in decentralized AI systems.
Score: 23.07041967514726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The deployment of large-scale models, such as large language models (LLMs), incurs substantial costs due to their computational demands. To mitigate these costs and address challenges related to scalability and data security, there is a growing shift towards decentralized systems for model deployment, where choosing efficient inference acceleration schemes become crucial to manage computational resources effectively and enhance system responsiveness. In this work, we address the challenge of selecting optimal acceleration methods in decentralized systems by introducing a meta-learning-based framework. This framework automates the selection process by learning from historical performance data of various acceleration techniques across different tasks. Unlike traditional methods that rely on random selection or expert intuition, our approach systematically identifies the best acceleration strategies based on the specific characteristics of each task. We demonstrate that our meta-learning framework not only streamlines the decision-making process but also consistently outperforms conventional methods in terms of efficiency and performance. Our results highlight the potential of inference acceleration in decentralized AI systems, offering a path towards more democratic and economically feasible artificial intelligence solutions.

Related papers

Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling [39.98859285173431]
The flexible job-shop scheduling problem (FJSP) has attracted significant attention due to its complex and strong alignment with real-world production scenarios.<n>Current deep reinforcement learning (DRL)-based approaches to FJSP predominantly employ constructive methods.<n>This paper proposes a Memory-enhanced Improvement Search framework with heterogeneous graph representation--MIStar.
arXiv Detail & Related papers (2026-03-03T10:43:01Z)
A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control [21.22244612145334]
Diffusion policies have emerged as a powerful approach for robotic control.<n>Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems are studied.
arXiv Detail & Related papers (2026-01-05T05:19:23Z)
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization [72.20212909644017]
Deliberate Practice Policy Optimization (DPPO) is a metacognitive Metaloop'' training framework.<n>DPPO alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement)<n> Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model.<n>We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck.
arXiv Detail & Related papers (2025-11-20T17:58:04Z)
MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning [82.14973479594367]
Large Language Models (LLMs) for complex reasoning tasks require innovative approaches that bridge intuitive and deliberate cognitive processes.<n>This paper introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless integration of System 1's fast, intuitive thinking with System 2's deliberate reasoning.
arXiv Detail & Related papers (2025-10-06T15:42:55Z)
Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite Systems [57.692181589325116]
Large-scale low-Earth-orbit (LEO) satellite systems are increasingly valued for their ability to enable rapid and wide-area data exchange.<n>Due to privacy concerns and regulatory constraints, raw data collected at remote clients cannot be centrally aggregated.<n>Federated learning offers a privacy-preserving alternative by training local models on distributed devices and exchanging only model parameters.<n>We propose a discrete temporal graph-based on-demand scheduling framework that dynamically allocates communication resources to accelerate federated learning.
arXiv Detail & Related papers (2025-09-05T03:33:42Z)
Onboard Optimization and Learning: A Survey [10.511932152633253]
Onboard learning is a transformative approach in edge AI, enabling real-time data processing, decision-making, and adaptive model training directly on resource-constrained devices.<n>However, onboard learning faces challenges such as limited computational resources, high inference costs, and security vulnerabilities.<n>This survey explores techniques that optimize model efficiency, accelerate inference, and support collaborative learning across distributed devices.
arXiv Detail & Related papers (2025-05-07T07:47:14Z)
RLER-TTE: An Efficient and Effective Framework for En Route Travel Time Estimation with Reinforcement Learning [5.4674463400564886]
En Route Travel Time Estimation aims to learn driving patterns from traveled routes to achieve rapid and accurate real-time predictions.<n>Existing methods ignore the complexity and dynamism of real-world traffic systems, resulting in significant gaps in efficiency and accuracy in real-time scenarios.<n>This paper proposes a novel framework that redefines the path implementation of ER-TTE to achieve highly efficient and effective predictions.
arXiv Detail & Related papers (2025-01-26T11:49:34Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments [17.309238729647287]
We introduce a meta-learning-based framework for inference acceleration in decentralized AI systems. Unlike traditional methods, our framework systematically identifies the best acceleration strategies based on the specific characteristics of each task. Our results highlight the potential of meta-learning to revolutionize inference acceleration in decentralized AI systems.
arXiv Detail & Related papers (2024-10-28T04:29:16Z)
Local Methods with Adaptivity via Scaling [38.99428012275441]
This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods. We consider the classical Local SGD method and enhance it with a scaling feature. In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
arXiv Detail & Related papers (2024-06-02T19:50:05Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.