High-speed object detection with a single-photon time-of-flight image
sensor
- URL: http://arxiv.org/abs/2107.13407v1
- Date: Wed, 28 Jul 2021 14:53:44 GMT
- Title: High-speed object detection with a single-photon time-of-flight image
sensor
- Authors: Germ\'an Mora-Mart\'in, Alex Turpin, Alice Ruget, Abderrahim Halimi,
Robert Henderson, Jonathan Leach and Istvan Gyongy
- Abstract summary: We present results from a portable SPAD camera system that outputs 16-bin photon timing histograms with 64x32 spatial resolution.
The results are relevant for safety-critical computer vision applications which would benefit from faster than human reaction times.
- Score: 2.648554238948439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D time-of-flight (ToF) imaging is used in a variety of applications such as
augmented reality (AR), computer interfaces, robotics and autonomous systems.
Single-photon avalanche diodes (SPADs) are one of the enabling technologies
providing accurate depth data even over long ranges. By developing SPADs in
array format with integrated processing combined with pulsed, flood-type
illumination, high-speed 3D capture is possible. However, array sizes tend to
be relatively small, limiting the lateral resolution of the resulting depth
maps, and, consequently, the information that can be extracted from the image
for applications such as object detection. In this paper, we demonstrate that
these limitations can be overcome through the use of convolutional neural
networks (CNNs) for high-performance object detection. We present outdoor
results from a portable SPAD camera system that outputs 16-bin photon timing
histograms with 64x32 spatial resolution. The results, obtained with exposure
times down to 2 ms (equivalent to 500 FPS) and in signal-to-background (SBR)
ratios as low as 0.05, point to the advantages of providing the CNN with full
histogram data rather than point clouds alone. Alternatively, a combination of
point cloud and active intensity data may be used as input, for a similar level
of performance. In either case, the GPU-accelerated processing time is less
than 1 ms per frame, leading to an overall latency (image acquisition plus
processing) in the millisecond range, making the results relevant for
safety-critical computer vision applications which would benefit from faster
than human reaction times.
Related papers
- Practical cross-sensor color constancy using a dual-mapping strategy [0.0]
The proposed method uses a dual-mapping strategy and only requires a simple white point from a test sensor under a D65 condition.
In the second mapping phase, we transform the re-constructed image data into sparse features, which are then optimized with a lightweight multi-layer perceptron (MLP) model.
This approach effectively reduces sensor discrepancies and delivers performance on par with leading cross-sensor methods.
arXiv Detail & Related papers (2023-11-20T13:58:59Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - Video super-resolution for single-photon LIDAR [0.0]
3D Time-of-Flight (ToF) image sensors are used widely in applications such as self-driving cars, Augmented Reality (AR) and robotics.
In this paper, we use synthetic depth sequences to train a 3D Convolutional Neural Network (CNN) for denoising and upscaling (x4) depth data.
With GPU acceleration, frames are processed at >30 frames per second, making the approach suitable for low-latency imaging, as required for obstacle avoidance.
arXiv Detail & Related papers (2022-10-19T11:33:29Z) - A direct time-of-flight image sensor with in-pixel surface detection and
dynamic vision [0.0]
3D flash LIDAR is an alternative to the traditional scanning LIDAR systems, promising precise depth imaging in a compact form factor.
We present a 64x32 pixel (256x128 SPAD) dToF imager that overcomes these limitations by using pixels with embedded histogramming.
This reduces the size of output data frames considerably, enabling maximum frame rates in the 10 kFPS range or 100 kFPS for direct depth readings.
arXiv Detail & Related papers (2022-09-23T14:38:00Z) - Single-Photon Structured Light [31.614032717665832]
"Single-Photon Structured Light" works by sensing binary images that indicates the presence or absence of photon arrivals during each exposure.
We develop novel temporal sequences using error correction codes that are designed to be robust to short-range effects like projector and camera defocus.
Our lab prototype is capable of 3D imaging in challenging scenarios involving objects with extremely low albedo or undergoing fast motion.
arXiv Detail & Related papers (2022-04-11T17:57:04Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - Expandable YOLO: 3D Object Detection from RGB-D Images [64.14512458954344]
This paper aims at constructing a light-weight object detector that inputs a depth and a color image from a stereo camera.
By extending the network architecture of YOLOv3 to 3D in the middle, it is possible to output in the depth direction.
Intersection over Uninon (IoU) in 3D space is introduced to confirm the accuracy of region extraction results.
arXiv Detail & Related papers (2020-06-26T07:32:30Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.