Learning to Find Missing Video Frames with Synthetic Data Augmentation:
A General Framework and Application in Generating Thermal Images Using RGB
Cameras
- URL: http://arxiv.org/abs/2403.00196v1
- Date: Thu, 29 Feb 2024 23:52:15 GMT
- Title: Learning to Find Missing Video Frames with Synthetic Data Augmentation:
A General Framework and Application in Generating Thermal Images Using RGB
Cameras
- Authors: Mathias Viborg Andersen, Ross Greer, Andreas M{\o}gelmose, Mohan
Trivedi
- Abstract summary: This paper addresses the issue of missing data due to sensor frame rate mismatches.
We propose using conditional generative adversarial networks (cGANs) to create synthetic yet realistic thermal imagery.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advanced Driver Assistance Systems (ADAS) in intelligent vehicles rely on
accurate driver perception within the vehicle cabin, often leveraging a
combination of sensing modalities. However, these modalities operate at varying
rates, posing challenges for real-time, comprehensive driver state monitoring.
This paper addresses the issue of missing data due to sensor frame rate
mismatches, introducing a generative model approach to create synthetic yet
realistic thermal imagery. We propose using conditional generative adversarial
networks (cGANs), specifically comparing the pix2pix and CycleGAN
architectures. Experimental results demonstrate that pix2pix outperforms
CycleGAN, and utilizing multi-view input styles, especially stacked views,
enhances the accuracy of thermal image generation. Moreover, the study
evaluates the model's generalizability across different subjects, revealing the
importance of individualized training for optimal performance. The findings
suggest the potential of generative models in addressing missing frames,
advancing driver state monitoring for intelligent vehicles, and underscoring
the need for continued research in model generalization and customization.
Related papers
- Analysis of Classifier Training on Synthetic Data for Cross-Domain Datasets [4.696575161583618]
This study focuses on camera-based traffic sign recognition applications for advanced driver assistance systems and autonomous driving.
The proposed augmentation pipeline of synthetic datasets includes novel augmentation processes such as structured shadows and gaussian specular highlights.
Experiments showed that a synthetic image-based approach outperforms in most cases real image-based training when applied to cross-domain test datasets.
arXiv Detail & Related papers (2024-10-30T07:11:41Z) - Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering [37.46760714516923]
This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars.
By focusing on the fusion of RGB imagery with depth completion information or optical flow data, we propose a framework that integrates these modalities through both early and hybrid fusion techniques.
arXiv Detail & Related papers (2024-09-18T09:36:24Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle
Components [77.33782775860028]
We introduce CarPatch, a novel synthetic benchmark of vehicles.
In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view.
Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques.
arXiv Detail & Related papers (2023-07-24T11:59:07Z) - Physics-Driven Turbulence Image Restoration with Stochastic Refinement [80.79900297089176]
Image distortion by atmospheric turbulence is a critical problem in long-range optical imaging systems.
Fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions.
This paper proposes the Physics-integrated Restoration Network (PiRN) to help the network to disentangle theity from the degradation and the underlying image.
arXiv Detail & Related papers (2023-07-20T05:49:21Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - CARNet: A Dynamic Autoencoder for Learning Latent Dynamics in Autonomous
Driving Tasks [11.489187712465325]
An autonomous driving system should effectively use the information collected from the various sensors in order to form an abstract description of the world.
Deep learning models, such as autoencoders, can be used for that purpose, as they can learn compact latent representations from a stream of incoming data.
This work proposes CARNet, a Combined dynAmic autoencodeR NETwork architecture that utilizes an autoencoder combined with a recurrent neural network to learn the current latent representation.
arXiv Detail & Related papers (2022-05-18T04:15:42Z) - Monitoring and Adapting the Physical State of a Camera for Autonomous
Vehicles [10.490646039938252]
We propose a generic and task-oriented self-health-maintenance framework for cameras based on data- and physically-grounded models.
We implement the framework on a real-world ground vehicle and demonstrate how a camera can adjust its parameters to counter a poor condition.
Our framework not only provides a practical ready-to-use solution to monitor and maintain the health of cameras, but can also serve as a basis for extensions to tackle more sophisticated problems.
arXiv Detail & Related papers (2021-12-10T11:14:44Z) - Predicting Take-over Time for Autonomous Driving with Real-World Data:
Robust Data Augmentation, Models, and Evaluation [11.007092387379076]
We develop and train take-over time (TOT) models that operate on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views.
We show that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay.
arXiv Detail & Related papers (2021-07-27T16:39:50Z) - TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation [77.09542018140823]
We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem.
TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
arXiv Detail & Related papers (2021-05-28T19:08:43Z) - Towards Automated Neural Interaction Discovery for Click-Through Rate
Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems.
We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.