Multi-Modal Fusion for Sensorimotor Coordination in Steering Angle
Prediction
- URL: http://arxiv.org/abs/2202.05500v1
- Date: Fri, 11 Feb 2022 08:22:36 GMT
- Title: Multi-Modal Fusion for Sensorimotor Coordination in Steering Angle
Prediction
- Authors: Farzeen Munir, Shoaib Azam, Byung-Geun Lee and Moongu Jeon
- Abstract summary: Imitation learning is employed to learn sensorimotor coordination for steering angle prediction in an end-to-end fashion.
This work explores the fusion of frame-based RGB and event data for learning end-to-end lateral control.
We propose DRFuser, a novel convolutional encoder-decoder architecture for learning end-to-end lateral control.
- Score: 8.707695512525717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning is employed to learn sensorimotor coordination for
steering angle prediction in an end-to-end fashion requires expert
demonstrations. These expert demonstrations are paired with environmental
perception and vehicle control data. The conventional frame-based RGB camera is
the most common exteroceptive sensor modality used to acquire the environmental
perception data. The frame-based RGB camera has produced promising results when
used as a single modality in learning end-to-end lateral control. However, the
conventional frame-based RGB camera has limited operability in illumination
variation conditions and is affected by the motion blur. The event camera
provides complementary information to the frame-based RGB camera. This work
explores the fusion of frame-based RGB and event data for learning end-to-end
lateral control by predicting steering angle. In addition, how the
representation from event data fuse with frame-based RGB data helps to predict
the lateral control robustly for the autonomous vehicle. To this end, we
propose DRFuser, a novel convolutional encoder-decoder architecture for
learning end-to-end lateral control. The encoder module is branched between the
frame-based RGB data and event data along with the self-attention layers.
Moreover, this study has also contributed to our own collected dataset
comprised of event, frame-based RGB, and vehicle control data. The efficacy of
the proposed method is experimentally evaluated on our collected dataset, Davis
Driving dataset (DDD), and Carla Eventscape dataset. The experimental results
illustrate that the proposed method DRFuser outperforms the state-of-the-art in
terms of root-mean-square error (RMSE) and mean absolute error (MAE) used as
evaluation metrics.
Related papers
- Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework [0.0]
Vision-based autonomous driving requires reliable and efficient object detection.
This work proposes a DiffusionDet-based framework that exploits data fusion from the monocular camera and depth sensor to provide the RGB and depth (RGB-D) data.
By integrating the textural and color features from RGB images with the spatial depth information from the LiDAR sensors, the proposed framework employs a feature fusion that substantially enhances object detection of automotive targets.
arXiv Detail & Related papers (2024-06-05T10:24:00Z) - Camera Motion Estimation from RGB-D-Inertial Scene Flow [9.192660643226372]
We introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow.
Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU)
arXiv Detail & Related papers (2024-04-26T08:42:59Z) - TUMTraf Event: Calibration and Fusion Resulting in a Dataset for
Roadside Event-Based and RGB Cameras [14.57694345706197]
Event-based cameras are predestined for Intelligent Transportation Systems (ITS)
They provide very high temporal resolution and dynamic range, which can eliminate motion blur and improve detection performance at night.
However, event-based images lack color and texture compared to images from a conventional RGB camera.
arXiv Detail & Related papers (2024-01-16T16:25:37Z) - Segment Any Events via Weighted Adaptation of Pivotal Tokens [85.39087004253163]
This paper focuses on the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data.
We introduce a multi-scale feature distillation methodology to optimize the alignment of token embeddings originating from event data with their RGB image counterparts.
arXiv Detail & Related papers (2023-12-24T12:47:08Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera [8.673063170884591]
EOLO is a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities.
Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events.
arXiv Detail & Related papers (2023-09-17T15:14:01Z) - Achieving RGB-D level Segmentation Performance from a Single ToF Camera [9.99197786343155]
We show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera.
arXiv Detail & Related papers (2023-06-30T13:14:27Z) - RGB-Only Reconstruction of Tabletop Scenes for Collision-Free
Manipulator Control [71.51781695764872]
We present a system for collision-free control of a robot manipulator that uses only RGB views of the world.
Perceptual input of a tabletop scene is provided by multiple images of an RGB camera that is either handheld or mounted on the robot end effector.
A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function (ESDF) is computed.
A model predictive control algorithm is then used to control the manipulator to reach a desired pose while avoiding obstacles in the ESDF.
arXiv Detail & Related papers (2022-10-21T01:45:08Z) - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision [76.41657124981549]
This paper presents a joint learning model for image alignment and RAW-to-sRGB mapping.
Experiments show that our method performs favorably against state-of-the-arts on ZRR and SR-RAW datasets.
arXiv Detail & Related papers (2021-08-18T12:41:36Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z) - Drone-based RGB-Infrared Cross-Modality Vehicle Detection via
Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image.
We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle.
Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.