Detecting Line Segments in Motion-blurred Images with Events
- URL: http://arxiv.org/abs/2211.07365v1
- Date: Mon, 14 Nov 2022 14:00:03 GMT
- Title: Detecting Line Segments in Motion-blurred Images with Events
- Authors: Huai Yu, Hao Li, Wen Yang, Lei Yu, Gui-Song Xia
- Abstract summary: Existing line segment detection methods face severe performance degradation when detecting line segments when motion blur occurs.
We propose to leverage the complementary information of images and events to robustly detect line segments over motion blurs.
Our method achieves 63.3% mean structural average precision (msAP) with the model pre-trained on the FE-Wireframe and fine-tuned on the FE-Blurframe.
- Score: 38.39698414942873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Making line segment detectors more reliable under motion blurs is one of the
most important challenges for practical applications, such as visual SLAM and
3D reconstruction. Existing line segment detection methods face severe
performance degradation for accurately detecting and locating line segments
when motion blur occurs. While event data shows strong complementary
characteristics to images for minimal blur and edge awareness at high-temporal
resolution, potentially beneficial for reliable line segment recognition. To
robustly detect line segments over motion blurs, we propose to leverage the
complementary information of images and events. To achieve this, we first
design a general frame-event feature fusion network to extract and fuse the
detailed image textures and low-latency event edges, which consists of a
channel-attention-based shallow fusion module and a self-attention-based dual
hourglass module. We then utilize two state-of-the-art wireframe parsing
networks to detect line segments on the fused feature map. Besides, we
contribute a synthetic and a realistic dataset for line segment detection,
i.e., FE-Wireframe and FE-Blurframe, with pairwise motion-blurred images and
events. Extensive experiments on both datasets demonstrate the effectiveness of
the proposed method. When tested on the real dataset, our method achieves 63.3%
mean structural average precision (msAP) with the model pre-trained on the
FE-Wireframe and fine-tuned on the FE-Blurframe, improved by 32.6 and 11.3
points compared with models trained on synthetic only and real only,
respectively. The codes, datasets, and trained models are released at:
https://levenberg.github.io/FE-LSD
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Implicit Ray-Transformers for Multi-view Remote Sensing Image
Segmentation [26.726658200149544]
We propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR) for RS scene semantic segmentation with sparse labels.
The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene.
In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations.
arXiv Detail & Related papers (2023-03-15T07:05:07Z) - FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D
Object Detection [19.419030878019974]
unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers.
The corresponding indexes between different sensor signals are established in advance in the data preprocessing.
Two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed.
arXiv Detail & Related papers (2022-09-15T16:13:19Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object
Detection [15.641616738865276]
We propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task.
Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector.
The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark.
arXiv Detail & Related papers (2021-06-23T14:53:22Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Seismic Fault Segmentation via 3D-CNN Training by a Few 2D Slices Labels [6.963867115353744]
We present a new binary cross-entropy and smooth L1 loss to train 3D-CNN by sampling some 2D slices from 3D seismic data.
Experiments show that our method can extract 3D seismic features from a few 2D slices labels on real data, to segment a complete fault volume.
arXiv Detail & Related papers (2021-05-09T07:13:40Z) - ULSD: Unified Line Segment Detection across Pinhole, Fisheye, and
Spherical Cameras [17.943949895764938]
Line segment detection is essential for high-level tasks in computer vision and robotics.
Currently, most stateof-the-art (SOTA) methods are dedicated to detecting straight line segments in undistorted pinhole images.
We propose to target at the unified line segment detection (ULSD) for both distorted and undistorted images.
arXiv Detail & Related papers (2020-11-06T03:30:17Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.