Related papers: Optimizing Small Language Models for In-Vehicle Function-Calling

Optimizing Small Language Models for In-Vehicle Function-Calling

URL: http://arxiv.org/abs/2501.02342v1
Date: Sat, 04 Jan 2025 17:32:56 GMT
Title: Optimizing Small Language Models for In-Vehicle Function-Calling
Authors: Yahya Sowti Khiabani, Farris Atif, Chieh Hsu, Sven Stahlmann, Tobias Michels, Sebastian Kramer, Benedikt Heidrich, M. Saquib Sarfraz, Julian Merten, Faezeh Tafazzoli,
Abstract summary: We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices.<n>By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience.
Score: 4.148443557388842
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the original model, our approach preserves the model's ability to handle complex in-vehicle tasks accurately and efficiently. Furthermore, by executing the model in a lightweight runtime environment, we achieve a generation speed of 11 tokens per second, making real-time, on-device inference feasible without hardware acceleration. Our results demonstrate the potential of SLMs to transform vehicle control systems, enabling more intuitive interactions between users and their vehicles for an enhanced driving experience.

Related papers

ToolACE-R: Tool Learning with Adaptive Self-Refinement [84.69651852838794]
Tool learning allows Large Language Models to leverage external tools for solving complex user tasks. We propose ToolACE-R, a novel method that introduces adaptive self-refinement for tool invocations. Our results demonstrate the effectiveness of the proposed method, which is compatible with base models of various sizes.
arXiv Detail & Related papers (2025-04-02T06:38:56Z)
AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs [68.99086112477565]
Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation. Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads. We propose AutoHete, an automatic and efficient heterogeneous training system compatible with both single- GPU and multi- GPU environments.
arXiv Detail & Related papers (2025-02-27T14:46:22Z)
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control [11.365124223329582]
We propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors. Our evaluations with robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings. These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.
arXiv Detail & Related papers (2024-12-02T01:33:49Z)
OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework [3.8320050452121692]
We introduce OWLed, the Outlier-Weighed Layerwise Pruning for Efficient Autonomous Driving Framework.<n>Our method assigns non-uniform sparsity ratios to different layers based on the distribution of outlier features.<n>To ensure the compressed model adapts well to autonomous driving tasks, we incorporate driving environment data into both the calibration and pruning processes.
arXiv Detail & Related papers (2024-11-12T10:55:30Z)
Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations [1.5723316845301678]
This report introduces a novel methodology for training with augmentations to enhance model robustness and performance in such conditions. We present a comprehensive framework that includes identifying weak spots in Machine Learning models, selecting suitable augmentations, and devising effective training strategies. Experimental results demonstrate improvements in model performance, as measured by commonly used metrics such as mean Average Precision (mAP) and mean Intersection over Union (mIoU) on open-source object detection and semantic segmentation models and datasets.
arXiv Detail & Related papers (2024-08-30T14:15:48Z)
MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower. We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events. We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z)
Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z)
MACP: Efficient Model Adaptation for Cooperative Perception [23.308578463976804]
We propose a new framework termed MACP, which equips a single-agent pre-trained model with cooperation capabilities. We demonstrate in experiments that the proposed framework can effectively utilize cooperative observations and outperform other state-of-the-art approaches.
arXiv Detail & Related papers (2023-10-25T14:24:42Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Real-time Neural-MPC: Deep Learning Model Predictive Control for Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline. We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.