Related papers: VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture

VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture

URL: http://arxiv.org/abs/2504.13365v1
Date: Thu, 17 Apr 2025 22:14:31 GMT
Title: VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture
Authors: Long Li, Jiajia Li, Dong Chen, Lina Pu, Haibo Yao, Yanbo Huang,
Abstract summary: We propose VLLFL, a vision-language model-based lightweight federated learning framework (VLLFL)<n>It harnesses the generalization and context-aware detection capabilities of the vision-language model (VLM) and leverages the privacy-preserving nature of federated learning.<n>VLLFL achieves 14.53% improvement in the performance of VLM while reducing 99.3% communication overhead.
Score: 12.468660942565792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In modern smart agriculture, object detection plays a crucial role by enabling automation, precision farming, and monitoring of resources. From identifying crop health and pest infestations to optimizing harvesting processes, accurate object detection enhances both productivity and sustainability. However, training object detection models often requires large-scale data collection and raises privacy concerns, particularly when sensitive agricultural data is distributed across farms. To address these challenges, we propose VLLFL, a vision-language model-based lightweight federated learning framework (VLLFL). It harnesses the generalization and context-aware detection capabilities of the vision-language model (VLM) and leverages the privacy-preserving nature of federated learning. By training a compact prompt generator to boost the performance of the VLM deployed across different farms, VLLFL preserves privacy while reducing communication overhead. Experimental results demonstrate that VLLFL achieves 14.53% improvement in the performance of VLM while reducing 99.3% communication overhead. Spanning tasks from identifying a wide variety of fruits to detecting harmful animals in agriculture, the proposed framework offers an efficient, scalable, and privacy-preserving solution specifically tailored to agricultural applications.

Related papers

Loss-Guided Model Sharing and Local Learning Correction in Decentralized Federated Learning for Crop Disease Classification [3.344876133162209]
We introduce a novel Decentralized Federated Learning (DFL) framework that uses validation loss (Loss_val) to guide model sharing between peers and to correct local training via an adaptive loss function controlled by weighting parameter.<n>Results demonstrate that our DFL approach not only improves accuracy and convergence speed, but also ensures better generalization and robustness across heterogeneous data environments.
arXiv Detail & Related papers (2025-05-29T04:12:53Z)
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG) FedE4RAG facilitates collaborative training of client-side RAG retrieval models. We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models [50.587868616659826]
We introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in vision representations.<n>Our experimental results reveal that SAEs trained on Vision-Language Models significantly enhance the monosemanticity of individual neurons.
arXiv Detail & Related papers (2025-04-03T17:58:35Z)
Breaking Focus: Contextual Distraction Curse in Large Language Models [68.4534308805202]
We investigate a critical vulnerability in Large Language Models (LLMs)<n>This phenomenon arises when models fail to maintain consistent performance on questions modified with semantically coherent but irrelevant context.<n>We propose an efficient tree-based search methodology to automatically generate CDV examples.
arXiv Detail & Related papers (2025-02-03T18:43:36Z)
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception [20.01853641155509]
Vision-language model (VLM) fine-tuning for application-specific visual grounding based on natural language instructions has become one of the most popular approaches for learning-enabled autonomous systems.<n>We propose a new generalizable framework to improve VLM fine-tuning by integrating it with a reinforcement learning (RL) agent.
arXiv Detail & Related papers (2025-01-31T04:30:42Z)
Edge-AI for Agriculture: Lightweight Vision Models for Disease Detection in Resource-Limited Settings [0.0]
The proposed system integrates advanced object detection, classification, and segmentation models, optimized for deployment on edge devices.<n>The study evaluates the performance of various state-of-the-art models, focusing on their accuracy, computational efficiency, and generalization capabilities.
arXiv Detail & Related papers (2024-12-23T06:48:50Z)
Vision Language Models are In-Context Value Learners [89.29486557646624]
We present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress. Without any robot or task specific training, GVL can in-context zero-shot and few-shot predict effective values for more than 300 distinct real-world tasks.
arXiv Detail & Related papers (2024-11-07T09:17:50Z)
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context. Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets. Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z)
Leveraging Vision Language Models for Specialized Agricultural Tasks [19.7240633020344]
We present AgEval, a benchmark for assessing Vision Language Models' capabilities in plant stress phenotyping.<n>Our study explores how general-purpose VLMs can be leveraged for domain-specific tasks with only a few annotated examples.<n>Our results demonstrate VLMs' rapid adaptability to specialized tasks, with the best-performing model showing an increase in F1 scores from 46.24% to 73.37% in 8-shot identification.
arXiv Detail & Related papers (2024-07-29T00:39:51Z)
CDFL: Efficient Federated Human Activity Recognition using Contrastive Learning and Deep Clustering [12.472038137777474]
Human Activity Recognition (HAR) is vital for the automation and intelligent identification of human actions through data from diverse sensors. Traditional machine learning approaches by aggregating data on a central server and centralized processing are memory-intensive and raise privacy concerns. This work proposes CDFL, an efficient federated learning framework for image-based HAR.
arXiv Detail & Related papers (2024-07-17T03:17:53Z)
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model [77.86593720792986]
We propose a Safety Preference Alignment dataset for Vision Language Models named SPA-VL.<n> SPA-VL covers 6 harmfulness domains, 13 categories, and 53 subcategories, and contains 100,788 samples of the quadruple (question, image, chosen response, rejected response)<n>Experiments indicate that models trained with alignment techniques on the SPA-VL dataset exhibit substantial improvements in harmlessness and helpfulness while maintaining core capabilities.
arXiv Detail & Related papers (2024-06-17T18:57:37Z)
Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks [0.3683202928838613]
Computer vision in agriculture is game-changing to transform farming into a data-driven, precise, and sustainable industry. Deep learning has empowered agriculture vision to analyze vast, complex visual data, but heavily rely on the availability of large annotated datasets. We propose a lightweight framework utilizing SimCLR, a contrastive learning approach, to pre-train a ResNet-50 backbone on a large, unannotated dataset of real-world agriculture field images.
arXiv Detail & Related papers (2024-03-22T14:46:51Z)
The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation [56.61543110071199]
Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset. Previous approaches have attempted to address SFVUDA by leveraging self-supervision derived from the target data itself. We take an approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift.
arXiv Detail & Related papers (2023-08-17T18:12:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.