Related papers: Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

URL: http://arxiv.org/abs/2310.04621v2
Date: Wed, 3 Apr 2024 21:29:05 GMT
Title: Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
Authors: Fred Hohman, Mary Beth Kery, Donghao Ren, Dominik Moritz,
Abstract summary: We interviewed 30 experts at Apple that specialize in producing efficient models. Our findings offer pragmatic considerations missing from prior work. We distill design recommendations for tooling to help ease the difficulty of this work.
Score: 19.174587549247985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday personal devices. However, today's large ML models must be drastically compressed to run efficiently on-device, a hurtle that requires deep, yet currently niche expertise. To engage the broader human-centered ML community in on-device ML experiences, we present the results from an interview study with 30 experts at Apple that specialize in producing efficient models. We compile tacit knowledge that experts have developed through practical experience with model compression across different hardware platforms. Our findings offer pragmatic considerations missing from prior work, covering the design process, trade-offs, and technical strategies that go into creating efficient models. Finally, we distill design recommendations for tooling to help ease the difficulty of this work and bring on-device ML into to more widespread practice.

Related papers

asanAI: In-Browser, No-Code, Offline-First Machine Learning Toolkit [0.0]
asanAI is an offline-first, open-source, no-code machine learning toolkit designed for users of all skill levels. It allows individuals to design, debug, train, and test ML models directly in a web browser. The toolkit runs on any device with a modern web browser, including smartphones, and ensures user privacy through local computations.
arXiv Detail & Related papers (2025-01-07T12:47:52Z)
Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model [50.37090759139591]
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters. The human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption. We are releasing a software toolkit named DarwinKit (Darkit) to accelerate the adoption of brain-inspired large language models.
arXiv Detail & Related papers (2024-12-20T07:50:08Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks [31.733088105662876]
We aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks.
arXiv Detail & Related papers (2023-04-28T17:03:57Z)
Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z)
SeLoC-ML: Semantic Low-Code Engineering for Machine Learning Applications in Industrial IoT [9.477629856092218]
This paper presents a framework called Semantic Low-Code Engineering for ML Applications (SeLoC-ML) SeLoC-ML enables non-experts to model, discover, reuse, and matchmake ML models and devices at scale. Developers can benefit from semantic application templates, called recipes, to fast prototype end-user applications.
arXiv Detail & Related papers (2022-07-18T13:06:21Z)
Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms. It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z)
Improving Semiconductor Device Modeling for Electronic Design Automation by Machine Learning Techniques [6.170514965470266]
We propose a self-augmentation strategy for improving ML-based device modeling using variational autoencoder-based techniques. To demonstrate the effectiveness of our approach, we apply it to a deep neural network-based prediction task for the Ohmic resistance value in Gallium Nitride devices.
arXiv Detail & Related papers (2021-05-25T00:52:44Z)
Learning by Design: Structuring and Documenting the Human Choices in Machine Learning Development [6.903929927172917]
We present a method consisting of eight design questions that outline the deliberation and normative choices going into creating a machine learning model. Our method affords several benefits, such as supporting critical assessment through methodological transparency. We believe that our method can help ML practitioners structure and justify their choices and assumptions when developing ML models.
arXiv Detail & Related papers (2021-05-03T08:47:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.