Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
- URL: http://arxiv.org/abs/2507.19261v1
- Date: Fri, 25 Jul 2025 13:37:45 GMT
- Title: Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
- Authors: Osama Almurshed, Ashish Kaushal, Asmail Muftah, Nitin Auluck, Omer Rana,
- Abstract summary: We introduce knowledge grafting to optimize AI models for resource-constrained environments.<n>The approach achieves an 88.54% reduction in model size.<n>It can be extended across various edge computing scenarios.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The increasing adoption of Artificial Intelligence (AI) has led to larger, more complex models with numerous parameters that require substantial computing power -- resources often unavailable in many real-world application scenarios. Our paper addresses this challenge by introducing knowledge grafting, a novel mechanism that optimizes AI models for resource-constrained environments by transferring selected features (the scion) from a large donor model to a smaller rootstock model. The approach achieves an 88.54% reduction in model size (from 64.39 MB to 7.38 MB), while improving generalization capability of the model. Our new rootstock model achieves 89.97% validation accuracy (vs. donor's 87.47%), maintains lower validation loss (0.2976 vs. 0.5068), and performs exceptionally well on unseen test data with 90.45% accuracy. It addresses the typical size vs performance trade-off, and enables deployment of AI frameworks on resource-constrained devices with enhanced performance. We have tested our approach on an agricultural weed detection scenario, however, it can be extended across various edge computing scenarios, potentially accelerating AI adoption in areas with limited hardware/software support -- by mirroring in a similar manner the horticultural grafting enables productive cultivation in challenging agri-based environments.
Related papers
- OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - White-Basilisk: A Hybrid Model for Code Vulnerability Detection [50.49233187721795]
We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance.<n>White-Basilisk achieves results in vulnerability detection tasks with a parameter count of only 200M.<n>This research establishes new benchmarks in code security and provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks.
arXiv Detail & Related papers (2025-07-11T12:39:25Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.<n>Our approach only leverages a small number of samples to search for the desired pruning policy.<n>We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z) - Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference [0.0]
We present Entropy Adaptive Decoding (EAD), a novel approach for efficient language model inference.<n>EAD switches between different-sized models based on prediction uncertainty.<n>We show remarkable efficiency gains across different model families.
arXiv Detail & Related papers (2025-02-05T22:15:21Z) - Edge-AI for Agriculture: Lightweight Vision Models for Disease Detection in Resource-Limited Settings [0.0]
The proposed system integrates advanced object detection, classification, and segmentation models, optimized for deployment on edge devices.<n>The study evaluates the performance of various state-of-the-art models, focusing on their accuracy, computational efficiency, and generalization capabilities.
arXiv Detail & Related papers (2024-12-23T06:48:50Z) - Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary.
Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain.
Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z) - Structured Model Pruning for Efficient Inference in Computational Pathology [2.9687381456164004]
We develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging.
We empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance.
arXiv Detail & Related papers (2024-04-12T22:05:01Z) - GISTEmbed: Guided In-sample Selection of Training Negatives for Text
Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model.
Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - Publishing Efficient On-device Models Increases Adversarial
Vulnerability [58.6975494957865]
In this paper, we study the security considerations of publishing on-device variants of large-scale models.
We first show that an adversary can exploit on-device models to make attacking the large models easier.
We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase.
arXiv Detail & Related papers (2022-12-28T05:05:58Z) - Human Parity on CommonsenseQA: Augmenting Self-Attention with External
Attention [66.93307963324834]
We propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear.
We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems.
The proposed system reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4% in comparison to the human accuracy of 88.9%.
arXiv Detail & Related papers (2021-12-06T18:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.