Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control
- URL: http://arxiv.org/abs/2412.01034v1
- Date: Mon, 02 Dec 2024 01:33:49 GMT
- Title: Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control
- Authors: Seongmin Park, Hyungmin Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, Jungwook Choi,
- Abstract summary: We propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors.
Our evaluations with robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings.
These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.
- Score: 11.365124223329582
- License:
- Abstract: Deep neural network (DNN)-based policy models like vision-language-action (VLA) models are transformative in automating complex decision-making across applications by interpreting multi-modal data. However, scaling these models greatly increases computational costs, which presents challenges in fields like robot manipulation and autonomous driving that require quick, accurate responses. To address the need for deployment on resource-limited hardware, we propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors during training, thereby maintaining efficiency and reliability under constrained conditions. Our evaluations with representative robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings while preserving accuracy. For 4-bit weight and activation quantized self-driving models, the framework achieves up to 3.7x speedup and 3.1x energy saving on a low-end GPU. These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.
Related papers
- Optimizing Small Language Models for In-Vehicle Function-Calling [4.148443557388842]
We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices.
By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience.
arXiv Detail & Related papers (2025-01-04T17:32:56Z) - DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - Efficient Motion Prediction: A Lightweight & Accurate Trajectory Prediction Model With Fast Training and Inference Speed [56.27022390372502]
We propose a new efficient motion prediction model, which achieves highly competitive benchmark results while training only a few hours on a single GPU.
Its low inference latency makes it particularly suitable for deployment in autonomous applications with limited computing resources.
arXiv Detail & Related papers (2024-09-24T14:58:27Z) - Tiny Reinforcement Learning for Quadruped Locomotion using Decision
Transformers [0.9217021281095907]
Resource-constrained robotic platforms are useful for tasks that require low-cost hardware alternatives.
We propose a method for making imitation learning deployable onto resource-constrained robotic platforms.
We show that our method achieves natural looking gaits for Bittle, a resource-constrained quadruped robot.
arXiv Detail & Related papers (2024-02-20T18:10:39Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Quantized Distillation: Optimizing Driver Activity Recognition Models
for Resource-Constrained Environments [34.80538284957094]
This paper introduces a lightweight framework for resource-efficient driver activity recognition.
The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification.
It achieves a threefold reduction in model size and a 1.4-fold improvement in inference time.
arXiv Detail & Related papers (2023-11-10T10:07:07Z) - Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy.
We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations.
We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - A High-Performance Adaptive Quantization Approach for Edge CNN
Applications [0.225596179391365]
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications.
The enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements.
In this paper, we introduce an adaptive high-performance quantization method to resolve the issue of biased activation.
arXiv Detail & Related papers (2021-07-18T07:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.