Related papers: Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC

Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC

URL: http://arxiv.org/abs/2409.12939v1
Date: Thu, 19 Sep 2024 17:50:50 GMT
Title: Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC
Authors: Vasileios Leon, Panagiotis Minaidis, George Lentaris, Dimitrios Soudris,
Abstract summary: This paper develops a hybrid AI/CV system on Intel's Movidius Myriad X for initializing and tracking the satellite's pose in space missions. The proposed single-chip, robust-estimation, and real-time solution delivers a throughput of up to 5 FPS for 1-MegaPixel RGB images within a limited power envelope of 2W.
Score: 3.829322478948514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The challenging deployment of Artificial Intelligence (AI) and Computer Vision (CV) algorithms at the edge pushes the community of embedded computing to examine heterogeneous System-on-Chips (SoCs). Such novel computing platforms provide increased diversity in interfaces, processors and storage, however, the efficient partitioning and mapping of AI/CV workloads still remains an open issue. In this context, the current paper develops a hybrid AI/CV system on Intel's Movidius Myriad X, which is an heterogeneous Vision Processing Unit (VPU), for initializing and tracking the satellite's pose in space missions. The space industry is among the communities examining alternative computing platforms to comply with the tight constraints of on-board data processing, while it is also striving to adopt functionalities from the AI domain. At algorithmic level, we rely on the ResNet-50-based UrsoNet network along with a custom classical CV pipeline. For efficient acceleration, we exploit the SoC's neural compute engine and 16 vector processors by combining multiple parallelization and low-level optimization techniques. The proposed single-chip, robust-estimation, and real-time solution delivers a throughput of up to 5 FPS for 1-MegaPixel RGB images within a limited power envelope of 2W.

Related papers

Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator [0.0]
We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption. The resulting device is currently being used in a fielded edge sensor application.
arXiv Detail & Related papers (2024-09-28T15:47:16Z)
Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly. We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z)
Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC) [1.949471382288103]
Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement. In this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined. Various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices are discussed.
arXiv Detail & Related papers (2024-07-16T11:51:41Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z)
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z)
Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures. We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
High Performance Hyperspectral Image Classification using Graphics Processing Units [0.0]
Real-time remote sensing applications require onboard real time processing capabilities. Lightweight, small size and low power consumption hardware is essential for onboard real time processing systems.
arXiv Detail & Related papers (2021-05-30T09:26:03Z)
AIPerf: Automated machine learning as an AI-HPC benchmark [17.57686674304368]
We propose an end-to-end benchmark suite utilizing automated machine learning (AutoML) We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems. With flexible workload and single metric, our benchmark can scale and rank AI- HPC easily.
arXiv Detail & Related papers (2020-08-17T08:06:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.