Porting Large Language Models to Mobile Devices for Question Answering
- URL: http://arxiv.org/abs/2404.15851v1
- Date: Wed, 24 Apr 2024 12:59:54 GMT
- Title: Porting Large Language Models to Mobile Devices for Question Answering
- Authors: Hannes Fassold,
- Abstract summary: We describe how we managed to port state of the art Large Language Models to mobile devices.
We employ the llama framework, a flexible and self-contained C++ framework for LLM inference.
Experimental results show that LLM inference runs in interactive speed on a Galaxy S21 smartphone.
- Score: 1.0878040851637998
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language processing available on the device. An important use case of LLMs is question answering, which can provide accurate and contextually relevant answers to a wide array of user queries. We describe how we managed to port state of the art LLMs to mobile devices, enabling them to operate natively on the device. We employ the llama.cpp framework, a flexible and self-contained C++ framework for LLM inference. We selected a 6-bit quantized version of the Orca-Mini-3B model with 3 billion parameters and present the correct prompt format for this model. Experimental results show that LLM inference runs in interactive speed on a Galaxy S21 smartphone and that the model delivers high-quality answers to user queries related to questions from different subjects like politics, geography or history.
Related papers
- SlimLM: An Efficient Small Language Model for On-Device Document Assistance [60.971107009492606]
We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices.
SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist.
We evaluate SlimLM against existing SLMs, showing comparable or superior performance.
arXiv Detail & Related papers (2024-11-15T04:44:34Z) - Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation [10.817783356090027]
Large language models (LLMs) increasingly integrate into every aspect of our work and daily lives.
There are growing concerns about user privacy, which push the trend toward local deployment of these models.
As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices.
arXiv Detail & Related papers (2024-10-04T17:14:59Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - QuickLLaMA: Query-aware Inference Acceleration for Large Language Models [94.82978039567236]
We introduce Query-aware Inference for Large Language Models (Q-LLM)
Q-LLM is designed to process extensive sequences akin to human cognition.
It can accurately capture pertinent information within a fixed window size and provide precise answers to queries.
arXiv Detail & Related papers (2024-06-11T17:55:03Z) - Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM.
We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z) - Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages [0.20971479389679337]
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant.
In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs)
Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model.
arXiv Detail & Related papers (2024-04-03T09:13:26Z) - LLMs for Robotic Object Disambiguation [21.101902684740796]
Our study reveals the LLM's aptitude for solving complex decision making challenges.
A pivotal focus of our research is the object disambiguation capability of LLMs.
We have developed a few-shot prompt engineering system to improve the LLM's ability to pose disambiguating queries.
arXiv Detail & Related papers (2024-01-07T04:46:23Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - Prompting Is Programming: A Query Language for Large Language Models [5.8010446129208155]
We present the novel idea of Language Model Programming (LMP)
LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting.
We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way.
arXiv Detail & Related papers (2022-12-12T18:09:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.