Adaptive Guidance Semantically Enhanced via Multimodal LLM for Edge-Cloud Object Detection
- URL: http://arxiv.org/abs/2509.19875v1
- Date: Wed, 24 Sep 2025 08:25:37 GMT
- Title: Adaptive Guidance Semantically Enhanced via Multimodal LLM for Edge-Cloud Object Detection
- Authors: Yunqing Hu, Zheming Yang, Chang Zhao, Wen Ji,
- Abstract summary: This paper proposes an adaptive guidance-based semantic enhancement edge-cloud collaborative object detection method.<n>It can reduce latency by over 79% and computational cost by 70% in low-light and highly occluded scenes.
- Score: 9.198326035948613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional object detection methods face performance degradation challenges in complex scenarios such as low-light conditions and heavy occlusions due to a lack of high-level semantic understanding. To address this, this paper proposes an adaptive guidance-based semantic enhancement edge-cloud collaborative object detection method leveraging Multimodal Large Language Models (MLLM), achieving an effective balance between accuracy and efficiency. Specifically, the method first employs instruction fine-tuning to enable the MLLM to generate structured scene descriptions. It then designs an adaptive mapping mechanism that dynamically converts semantic information into parameter adjustment signals for edge detectors, achieving real-time semantic enhancement. Within an edge-cloud collaborative inference framework, the system automatically selects between invoking cloud-based semantic guidance or directly outputting edge detection results based on confidence scores. Experiments demonstrate that the proposed method effectively enhances detection accuracy and efficiency in complex scenes. Specifically, it can reduce latency by over 79% and computational cost by 70% in low-light and highly occluded scenes while maintaining accuracy.
Related papers
- IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z) - ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization [4.700604993101454]
ADAPT is a hybrid method combining beam search and adaptive gradient-guided mutation.<n>We show that ADAPT consistently outperforms prior methods across layers and latent types.<n>Our results establish that feature visualization for LLMs is tractable, but requires design assumptions tailored to the domain.
arXiv Detail & Related papers (2026-02-19T22:03:25Z) - Robust Subpixel Localization of Diagonal Markers in Large-Scale Navigation via Multi-Layer Screening and Adaptive Matching [18.710429100680006]
This paper proposes a robust, high-precision positioning methodology to address localization failures in large-scale flight navigation.<n>The proposed methodology employs a three-tiered framework incorporating multi-layer corner screening and adaptive template matching.<n> Experimental results demonstrate the method's effectiveness in extracting and localizing diagonal markers in complex, large-scale environments.
arXiv Detail & Related papers (2026-01-13T02:51:31Z) - AIVD: Adaptive Edge-Cloud Collaboration for Accurate and Efficient Industrial Visual Detection [15.419663374345845]
This paper proposes the AIVD framework, which achieves unified precise localization and high-quality semantic generation.<n>To enhance the cloud MLLM's robustness against edge cropped-box noise and scenario variations, we design an efficient fine-tuning strategy.<n>To maintain high throughput and low latency across heterogeneous edge devices and dynamic network conditions, we propose a heterogeneous resource-aware dynamic scheduling algorithm.
arXiv Detail & Related papers (2026-01-08T08:56:07Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering [49.212940215720884]
We propose a steering framework that generates sample-level interference from user data and injects it into the model's forward pass for personalized adaptation.<n>Our method significantly enhances personalization performance in fast-shifting environments while maintaining robustness across varying interaction modes and context lengths.
arXiv Detail & Related papers (2025-10-31T06:01:04Z) - DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models [60.713908578319256]
We propose Direct Discrepancy Learning (DDL) to optimize the detector with task-oriented knowledge.<n>Built upon this, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance.<n>MIRAGE samples human-written texts from 10 corpora across 5 text-domains, which are then re-generated or revised using 17 cutting-edge LLMs.
arXiv Detail & Related papers (2025-09-15T10:59:57Z) - Efficient Out-of-Scope Detection in Dialogue Systems via Uncertainty-Driven LLM Routing [6.579756339673344]
Out-of-scope (OOS) intent detection is a critical challenge in task-oriented dialogue systems (TODS)<n>We propose a novel but simple modular framework that combines uncertainty modeling with fine-tuned large language models (LLMs) for efficient and accurate OOS detection.
arXiv Detail & Related papers (2025-07-02T09:51:41Z) - Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios [54.58186816693791]
environments constantly change over time and space, posing significant challenges for object detectors trained based on a closed-set assumption.<n>We propose a new mechanism, converting the fine-tuning process to a specific- parameter generation.<n>In particular, we first design a dual-path LoRA-based domain-aware adapter that disentangles features into domain-invariant and domain-specific components.
arXiv Detail & Related papers (2025-06-30T17:14:12Z) - Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment [59.61554561979589]
Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios.<n>Existing edge detection methods face challenges: difficulty balancing detection precision with lightweight models, limited adaptability, and insufficient real-world validation.<n>We propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments.
arXiv Detail & Related papers (2024-12-24T07:28:10Z) - SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion [27.7252951625431]
We propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA)<n>SCA employs an inversion method to extract edit-friendly noise maps and utilizes a Multimodal Large Language Model (MLLM) to provide semantic guidance.<n>Our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes.
arXiv Detail & Related papers (2024-10-03T06:25:53Z) - Intent Detection in the Age of LLMs [3.755082744150185]
Intent detection is a critical component of task-oriented dialogue systems (TODS)
Traditional approaches relied on computationally efficient supervised sentence transformer encoder models.
The emergence of generative large language models (LLMs) with intrinsic world knowledge presents new opportunities to address these challenges.
arXiv Detail & Related papers (2024-10-02T15:01:55Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.