A General-Purpose Device for Interaction with LLMs
- URL: http://arxiv.org/abs/2408.10230v1
- Date: Fri, 2 Aug 2024 23:43:29 GMT
- Title: A General-Purpose Device for Interaction with LLMs
- Authors: Jiajun Xu, Qun Wang, Yuhang Cao, Baitao Zeng, Sicheng Liu,
- Abstract summary: This paper investigates integrating large language models (LLMs) with advanced hardware.
We focus on developing a general-purpose device designed for enhanced interaction with LLMs.
- Score: 3.052172365469752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware. Despite substantial progress in LLM technology, a significant gap exists in hardware development, particularly concerning scalability, efficiency, affordability, and multimodal capabilities. This disparity presents both challenges and opportunities, underscoring the need for hardware that is not only powerful but also versatile and capable of managing the sophisticated demands of modern computation. Our proposed device addresses these needs by emphasizing scalability, multimodal data processing, enhanced user interaction, and privacy considerations, offering a comprehensive platform for LLM integration in various applications.
Related papers
- Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters.
We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models [16.250856588632637]
The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence.
These models are increasingly integrated into diverse applications, impacting both research and industry.
This paper surveys hardware and software co-design approaches specifically tailored to address the unique characteristics and constraints of large language models.
arXiv Detail & Related papers (2024-10-08T21:46:52Z) - On-Device Language Models: A Comprehensive Review [26.759861320845467]
Review examines the challenges of deploying computationally expensive large language models on resource-constrained devices.
Paper investigates on-device language models, their efficient architectures, as well as state-of-the-art compression techniques.
Case studies of on-device language models from major mobile manufacturers demonstrate real-world applications and potential benefits.
arXiv Detail & Related papers (2024-08-26T03:33:36Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation [16.836658183451764]
Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data.
Existing publicly available hardware datasets are often limited in size, complexity, or detail.
We propose a Multi-Grained-Verilog (MG-Verilog) dataset, which encompasses descriptions at various levels of detail and corresponding code samples.
arXiv Detail & Related papers (2024-07-02T03:21:24Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - Demystifying Platform Requirements for Diverse LLM Inference Use Cases [7.233203254714951]
We present an analytical tool, GenZ, to study the relationship between large language models inference performance and various platform design parameters.
We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings.
Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications.
arXiv Detail & Related papers (2024-06-03T18:00:50Z) - LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs)
LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface.
In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z) - CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion [58.15403987979496]
CREMA is a generalizable, highly efficient, and modular modality-fusion framework for video reasoning.
We propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy.
We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including VideoQA and Video-Audio/3D/Touch/Thermal QA.
arXiv Detail & Related papers (2024-02-08T18:27:22Z) - LLMs as On-demand Customizable Service [8.440060524215378]
We introduce a concept of hierarchical, distributed Large Language Models (LLMs)
By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service.
We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs.
arXiv Detail & Related papers (2024-01-29T21:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.