Related papers: Q-Segment: Segmenting Images In-Sensor for Vessel-Based Medical Diagnosis

Q-Segment: Segmenting Images In-Sensor for Vessel-Based Medical Diagnosis

URL: http://arxiv.org/abs/2312.09854v3
Date: Mon, 4 Mar 2024 15:21:18 GMT
Title: Q-Segment: Segmenting Images In-Sensor for Vessel-Based Medical Diagnosis
Authors: Pietro Bonazzi, Yawei Li, Sizhen Bian, Michele Magno
Abstract summary: We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with the Sony IMX500. Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments.
Score: 13.018482089796159
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the growing interest in deploying deep learning models directly in-sensor. We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with an in-sensors processor, the Sony IMX500. One of the main goals of the model is to achieve end-to-end image segmentation for vessel-based medical diagnosis. Deployed on the IMX500 platform, Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. We compare the proposed network with state-of-the-art models, both float and quantized, demonstrating that the proposed solution outperforms existing networks on various platforms in computing efficiency, e.g., by a factor of 75x compared to ERFNet. The network employs an encoder-decoder structure with skip connections, and results in a binary accuracy of 97.25% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 96.97% on the CHASE dataset. We also present a comparison of the IMX500 processing core with the Sony Spresense, a low-power multi-core ARM Cortex-M microcontroller, and a single-core ARM Cortex-M4 showing that it can achieve in-sensor processing with end-to-end low latency (17 ms) and power concumption (254mW). This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments.

Related papers

PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications [10.20223636234956]
PicoSAM2, a lightweight (1.3M parameters, 336M MACs) promptable segmentation model optimized for edge and in-sensor execution, including the Sony IMX500.<n>On COCO and LVIS, it achieves 51.9% and 44.9% mIoU, respectively.<n>The quantized model (1.22MB) runs at 14.3 ms on the IMX500-achieving 86 MACs/cycle, making it the only model meeting both memory and compute constraints for in-sensor deployment.
arXiv Detail & Related papers (2025-06-23T16:16:02Z)
High Throughput Event Filtering: The Interpolation-based DIF Algorithm Hardware Architecture [0.0]
We propose a hardware architecture of the Distance-based Interpolation with Frequency Weights filter and implement it on an FPGA chip.<n>Our architecture achieved a throughput of 403.39 million events per second for a sensor resolution of 1280 x 720 and 428.45 MEPS for a resolution of 640 x 480.<n>The average values of the Area Under the Receiver Operating Characteristic (AUROC) index ranged from 0.844 to 0.999 depending on the dataset.
arXiv Detail & Related papers (2025-06-06T07:49:18Z)
Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats [0.5420492913071214]
We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning.
arXiv Detail & Related papers (2025-04-04T19:32:47Z)
QMaxViT-Unet+: A Query-Based MaxViT-Unet with Edge Enhancement for Scribble-Supervised Segmentation of Medical Images [0.0]
We propose QMaxViT-Unet+, a novel framework for scribble-supervised medical image segmentation. This framework is built on the U-Net architecture, with the encoder and decoder replaced by Multi-Axis Vision Transformer (MaxViT) blocks. We evaluate the proposed QMaxViT-Unet+ on four public datasets focused on cardiac structures, colorectal polyps, and breast cancer.
arXiv Detail & Related papers (2025-02-14T16:56:24Z)
LHU-Net: A Light Hybrid U-Net for Cost-Efficient, High-Performance Volumetric Medical Image Segmentation [4.168081528698768]
We introduce LHU-Net, a streamlined Hybrid U-Net for medical image segmentation. Tested on five benchmark datasets, LHU-Net demonstrated superior efficiency and accuracy.
arXiv Detail & Related papers (2024-04-07T22:58:18Z)
Gesture Recognition for FMCW Radar on the Edge [0.0]
We show that gestures can be characterized efficiently by a set of five features. A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures. The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset.
arXiv Detail & Related papers (2023-10-13T06:03:07Z)
Implementation of a perception system for autonomous vehicles using a detection-segmentation network in SoC FPGA [0.0]
We have used the MultiTaskV3 detection-segmentation network as the basis for a perception system that can perform both functionalities within a single architecture. The whole system consumes relatively little power compared to a CPU-based implementation. It also achieves an accuracy higher than 97% of the mAP for object detection and above 90% of the mIoU for image segmentation.
arXiv Detail & Related papers (2023-07-17T17:44:18Z)
Ultra-low Power Deep Learning-based Monocular Relative Localization Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones. To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations. Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision. We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture. Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z)
Rethinking BiSeNet For Real-time Semantic Segmentation [6.622485130017622]
BiSeNet has been proved to be a popular two-stream network for real-time segmentation. We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
arXiv Detail & Related papers (2021-04-27T13:49:47Z)
Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism. We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
AnalogNet: Convolutional Neural Network Inference on Analog Focal Plane Sensor Processors [0.0]
We present a high-speed, energy-efficient Convolutional Neural Network (CNN) architecture utilising the capabilities of a unique class of devices known as analog Plane Sensor Processors (FPSP) Unlike traditional vision systems, where the sensor array sends collected data to a separate processor for processing, FPSPs allow data to be processed on the imaging device itself. Our proposed architecture, coined AnalogNet, reaches a testing accuracy of 96.9% on the MNIST handwritten digits recognition task, at a speed of 2260 FPS, for a cost of 0.7 mJ per frame.
arXiv Detail & Related papers (2020-06-02T16:44:43Z)
Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian Detection [99.94079901071163]
This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs) We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm. Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%.
arXiv Detail & Related papers (2020-04-03T17:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.