High-speed object detection with a single-photon time-of-flight image
sensor
- URL: http://arxiv.org/abs/2107.13407v1
- Date: Wed, 28 Jul 2021 14:53:44 GMT
- Title: High-speed object detection with a single-photon time-of-flight image
sensor
- Authors: Germ\'an Mora-Mart\'in, Alex Turpin, Alice Ruget, Abderrahim Halimi,
Robert Henderson, Jonathan Leach and Istvan Gyongy
- Abstract summary: We present results from a portable SPAD camera system that outputs 16-bin photon timing histograms with 64x32 spatial resolution.
The results are relevant for safety-critical computer vision applications which would benefit from faster than human reaction times.
- Score: 2.648554238948439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D time-of-flight (ToF) imaging is used in a variety of applications such as
augmented reality (AR), computer interfaces, robotics and autonomous systems.
Single-photon avalanche diodes (SPADs) are one of the enabling technologies
providing accurate depth data even over long ranges. By developing SPADs in
array format with integrated processing combined with pulsed, flood-type
illumination, high-speed 3D capture is possible. However, array sizes tend to
be relatively small, limiting the lateral resolution of the resulting depth
maps, and, consequently, the information that can be extracted from the image
for applications such as object detection. In this paper, we demonstrate that
these limitations can be overcome through the use of convolutional neural
networks (CNNs) for high-performance object detection. We present outdoor
results from a portable SPAD camera system that outputs 16-bin photon timing
histograms with 64x32 spatial resolution. The results, obtained with exposure
times down to 2 ms (equivalent to 500 FPS) and in signal-to-background (SBR)
ratios as low as 0.05, point to the advantages of providing the CNN with full
histogram data rather than point clouds alone. Alternatively, a combination of
point cloud and active intensity data may be used as input, for a similar level
of performance. In either case, the GPU-accelerated processing time is less
than 1 ms per frame, leading to an overall latency (image acquisition plus
processing) in the millisecond range, making the results relevant for
safety-critical computer vision applications which would benefit from faster
than human reaction times.
Related papers
- bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction [57.199618102578576]
We propose bit2bit, a new method for reconstructing high-quality image stacks at original resolution from sparse binary quantatemporal image data.
Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data.
We present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions.
arXiv Detail & Related papers (2024-10-30T17:30:35Z) - Single-Photon 3D Imaging with Equi-Depth Photon Histograms [4.432168053497992]
Single-photon 3D cameras estimate the round-trip time of a laser pulse by forming equi-width (EW) histograms of detected photon timestamps.
EW histograms require high bandwidth and in-pixel memory, making SPCs less attractive in resource-constrained settings.
We propose a 3D sensing technique based on equi-depth (ED) histograms.
arXiv Detail & Related papers (2024-08-28T22:02:38Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - Video super-resolution for single-photon LIDAR [0.0]
3D Time-of-Flight (ToF) image sensors are used widely in applications such as self-driving cars, Augmented Reality (AR) and robotics.
In this paper, we use synthetic depth sequences to train a 3D Convolutional Neural Network (CNN) for denoising and upscaling (x4) depth data.
With GPU acceleration, frames are processed at >30 frames per second, making the approach suitable for low-latency imaging, as required for obstacle avoidance.
arXiv Detail & Related papers (2022-10-19T11:33:29Z) - A direct time-of-flight image sensor with in-pixel surface detection and
dynamic vision [0.0]
3D flash LIDAR is an alternative to the traditional scanning LIDAR systems, promising precise depth imaging in a compact form factor.
We present a 64x32 pixel (256x128 SPAD) dToF imager that overcomes these limitations by using pixels with embedded histogramming.
This reduces the size of output data frames considerably, enabling maximum frame rates in the 10 kFPS range or 100 kFPS for direct depth readings.
arXiv Detail & Related papers (2022-09-23T14:38:00Z) - Single-Photon Structured Light [31.614032717665832]
"Single-Photon Structured Light" works by sensing binary images that indicates the presence or absence of photon arrivals during each exposure.
We develop novel temporal sequences using error correction codes that are designed to be robust to short-range effects like projector and camera defocus.
Our lab prototype is capable of 3D imaging in challenging scenarios involving objects with extremely low albedo or undergoing fast motion.
arXiv Detail & Related papers (2022-04-11T17:57:04Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - Expandable YOLO: 3D Object Detection from RGB-D Images [64.14512458954344]
This paper aims at constructing a light-weight object detector that inputs a depth and a color image from a stereo camera.
By extending the network architecture of YOLOv3 to 3D in the middle, it is possible to output in the depth direction.
Intersection over Uninon (IoU) in 3D space is introduced to confirm the accuracy of region extraction results.
arXiv Detail & Related papers (2020-06-26T07:32:30Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.