CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation
- URL: http://arxiv.org/abs/2510.19670v1
- Date: Wed, 22 Oct 2025 15:16:56 GMT
- Title: CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation
- Authors: Hasan Akgul, Mari Eplik, Javier Rojas, Aina Binti Abdullah, Pieter van der Merwe,
- Abstract summary: CoSense-LLM is an edge-first framework that turns continuous multimodal sensor streams into compact semantic tokens.<n>System works with modern serving optimizations, including paged or streaming KV caches, Flash style kernels, speculative decoding, and quantized LoRA adapters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present CoSense-LLM, an edge-first framework that turns continuous multimodal sensor streams (for example Wi-Fi CSI, IMU, audio, RFID, and lightweight vision) into compact, verifiable semantic tokens and coordinates with large language models under explicit latency, energy, bandwidth, and privacy constraints. CoSense-LLM has four parts: (i) SenseFusion, a lightweight encoder that aligns sensor embeddings with language and compresses them into short discrete code sequences; (ii) Edge-RAG, a local hybrid retrieval layer that grounds generation in site specific policies and notes; (iii) PromptRouter, a cost and uncertainty aware policy that selects edge only generation, edge plus retrieval, or compact cloud escalation; and (iv) Secure Execution, an auditable redaction path that enforces data minimization so raw waveforms never leave the device. The system works with modern serving optimizations, including paged or streaming KV caches, FlashAttention style kernels, speculative decoding, and quantized LoRA adapters, and supports on device personalization and federated updates under non IID drift. Across home, office, and clinic deployments, CoSense-LLM delivers grounded explanations while meeting tight service level objectives: it sustains sub second (p95) end to end latency on edge dominant paths, reduces inter tier token and bandwidth costs by preferring local retrieval grounded responses, and preserves privacy by transmitting only discrete codes and redacted metadata. Ablations show that Edge-RAG improves factual consistency and reduces contradictions, calibrated uncertainty enables selective abstention and controlled escalations, and KV plus decoding accelerators lower energy per decision. The results support an edge first design that treats semantics, privacy, and predictable latency as co equal goals for large model deployments in interference prone environments.
Related papers
- A Secure and Private Distributed Bayesian Federated Learning Design [56.92336577799572]
Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server.<n>DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy.<n>We propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration.
arXiv Detail & Related papers (2026-02-23T16:12:02Z) - Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping [61.459927600301654]
Multi-condition control is bottlenecked by the conventional concatenate-and-attend'' strategy.<n>Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant.<n>We propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies.
arXiv Detail & Related papers (2026-02-06T16:39:10Z) - Bridging the Perception Gap: A Lightweight Coarse-to-Fine Architecture for Edge Audio Systems [10.143590597259792]
CoFi-Agent is a hybrid architecture targeting edge servers and gateways.<n>It performs fast local perception and triggers conditional forensic refinement only when uncertainty is detected.<n>On the MMAR benchmark, CoFi-Agent improves accuracy from 27.20% to 53.60%, while achieving a better accuracy-efficiency trade-off than an always-on investigation pipeline.
arXiv Detail & Related papers (2026-01-22T05:57:25Z) - Joint Sensing, Communication, and Computation for Vertical Federated Edge Learning in Edge Perception Network [75.78245138352698]
In this paper, we consider an integrated sensing, communication, and computation-enabled edge perception network.<n>Multiple edge devices utilize wireless signals to sense environmental information for updating their local models, and the edge server aggregates feature embeddings via over-the-air computation for global model training.<n>First, we analyze the convergence behavior of the ISCC-enabled VFEEL in terms of the loss function degradation in the presence of wireless sensing noise and aggregation distortions during AirComp.
arXiv Detail & Related papers (2025-12-03T02:20:58Z) - PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration [8.776463501718737]
We propose a context-aware framework that dynamically balances privacy and inference quality.<n>PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module on the edge selects an execution mode - cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt.
arXiv Detail & Related papers (2025-11-27T22:32:33Z) - Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? [57.000348519630286]
Recent advances in mobile edge computing have made it possible to offload-intensive object detection to edge servers equipped with high-accuracy neural networks.<n>This hybrid approach offers a promising solution but introduces a new challenge: deciding when to perform edge detection versus local tracking.<n>We propose the LTED-Ada in single-device setting, a deep reinforcement learning-based algorithm that adaptively selects between local tracking and edge detection.
arXiv Detail & Related papers (2025-11-25T04:54:51Z) - ZK-SenseLM: Verifiable Large-Model Wireless Sensing with Selective Abstention and Zero-Knowledge Attestation [0.0]
ZK-SenseLM is a secure and auditable wireless sensing framework.<n>It pairs a large-model encoder for Wi-Fi channel state information with a policy-grounded decision layer and zero-knowledge proofs of inference.
arXiv Detail & Related papers (2025-10-29T16:43:07Z) - Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids [2.721477719641864]
This paper introduces a graph-centric, multimodal detector that fuses physical-layer and behavioral indicators over temporal windows to detect passive attacks.<n>The model achieves a testing accuracy of 98.32% per-timestep and 93.35% per-sequence at 0.15% FPR.
arXiv Detail & Related papers (2025-09-29T08:52:30Z) - Adaptive Learning for IRS-Assisted Wireless Networks: Securing Opportunistic Communications Against Byzantine Eavesdroppers [7.256056777973974]
We propose a joint learning framework for Byzantine-resilient spectrum sensing and secure intelligent reflecting surface (IRS)<n>We develop an augmented-Lagrangian alternating algorithm with projected updates and provide provable sublinear convergence, with accelerated rates under mild local curvature.<n> Simulations across diverse network conditions show higher detection probability at fixed false-alarm rate under adversarial attacks, large reductions in sum MSE for honest users, strong suppression of eavesdropper signal power, and fast convergence.
arXiv Detail & Related papers (2025-08-11T17:28:25Z) - SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving [7.91607650966469]
acronym is a framework that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models.<n>A single, shared edge server verifies the tokens utilizing a more precise target model.<n>Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits.
arXiv Detail & Related papers (2025-06-11T04:55:54Z) - PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts [59.5243730853157]
Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns.<n>Small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks.<n>We propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework to balance computational cost, performance, and privacy protection under bandwidth constraints.
arXiv Detail & Related papers (2025-05-13T16:27:07Z) - DiffCom: Decoupled Sparse Priors Guided Diffusion Compression for Point Clouds [54.96190721255167]
Lossy compression relies on an autoencoder to transform a point cloud into latent points for storage.<n>We propose a diffusion-based framework guided by sparse priors that achieves high reconstruction quality, especially at lows.
arXiv Detail & Related papers (2024-11-21T05:41:35Z) - Collaborative Automatic Modulation Classification via Deep Edge Inference for Hierarchical Cognitive Radio Networks [19.303303020775555]
In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification.
In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed.
arXiv Detail & Related papers (2024-09-12T11:14:25Z) - Over-the-Air Federated Learning with Privacy Protection via Correlated
Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server.
Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy.
In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - FCOS: A simple and strong anchor-free object detector [111.87691210818194]
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion.
Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes.
In contrast, our proposed detector FCOS is anchor box free, as well as proposal free.
arXiv Detail & Related papers (2020-06-14T01:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.