Learning to Find Missing Video Frames with Synthetic Data Augmentation:
A General Framework and Application in Generating Thermal Images Using RGB
Cameras
- URL: http://arxiv.org/abs/2403.00196v1
- Date: Thu, 29 Feb 2024 23:52:15 GMT
- Title: Learning to Find Missing Video Frames with Synthetic Data Augmentation:
A General Framework and Application in Generating Thermal Images Using RGB
Cameras
- Authors: Mathias Viborg Andersen, Ross Greer, Andreas M{\o}gelmose, Mohan
Trivedi
- Abstract summary: This paper addresses the issue of missing data due to sensor frame rate mismatches.
We propose using conditional generative adversarial networks (cGANs) to create synthetic yet realistic thermal imagery.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advanced Driver Assistance Systems (ADAS) in intelligent vehicles rely on
accurate driver perception within the vehicle cabin, often leveraging a
combination of sensing modalities. However, these modalities operate at varying
rates, posing challenges for real-time, comprehensive driver state monitoring.
This paper addresses the issue of missing data due to sensor frame rate
mismatches, introducing a generative model approach to create synthetic yet
realistic thermal imagery. We propose using conditional generative adversarial
networks (cGANs), specifically comparing the pix2pix and CycleGAN
architectures. Experimental results demonstrate that pix2pix outperforms
CycleGAN, and utilizing multi-view input styles, especially stacked views,
enhances the accuracy of thermal image generation. Moreover, the study
evaluates the model's generalizability across different subjects, revealing the
importance of individualized training for optimal performance. The findings
suggest the potential of generative models in addressing missing frames,
advancing driver state monitoring for intelligent vehicles, and underscoring
the need for continued research in model generalization and customization.
Related papers
- Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model [63.336123527432136]
We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.
Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.
We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-12-11T06:35:18Z) - Extrapolated Urban View Synthesis Benchmark [53.657271730352214]
Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs)
At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs.
Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes.
We have released our data to help advance self-driving and urban robotics simulation technology.
arXiv Detail & Related papers (2024-12-06T18:41:39Z) - Exploring Fully Convolutional Networks for the Segmentation of Hyperspectral Imaging Applied to Advanced Driver Assistance Systems [1.8874331450711404]
We explore the use of hyperspectral imaging (HSI) in Advanced Driver Assistance Systems (ADAS)
This paper describes some experimental results of the application of fully convolutional networks (FCN) to the image segmentation of HSI for ADAS applications.
We use the HSI-Drive v1.1 dataset, which provides a set of labelled images recorded in real driving conditions with a small-size snapshot NIR-HSI camera.
arXiv Detail & Related papers (2024-12-05T08:58:25Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning [13.613407983544427]
We introduce a robust model designed to withstand changes in camera position within the vehicle.
Our Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module.
Experiments conducted on the daytime and nighttime subsets of the 100-Driver dataset validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-11-20T10:27:12Z) - LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution [1.747623282473278]
Fusing multiple modalities to produce high-resolution images often requires dense models with millions of parameters and a heavy computational load.
We propose LapGSR, a multimodal, lightweight, generative model incorporating Laplacian image pyramids for guided thermal super-resolution.
arXiv Detail & Related papers (2024-11-12T12:23:19Z) - Analysis of Classifier Training on Synthetic Data for Cross-Domain Datasets [4.696575161583618]
This study focuses on camera-based traffic sign recognition applications for advanced driver assistance systems and autonomous driving.
The proposed augmentation pipeline of synthetic datasets includes novel augmentation processes such as structured shadows and gaussian specular highlights.
Experiments showed that a synthetic image-based approach outperforms in most cases real image-based training when applied to cross-domain test datasets.
arXiv Detail & Related papers (2024-10-30T07:11:41Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Predicting Take-over Time for Autonomous Driving with Real-World Data:
Robust Data Augmentation, Models, and Evaluation [11.007092387379076]
We develop and train take-over time (TOT) models that operate on mid and high-level features produced by computer vision algorithms operating on different driver-facing camera views.
We show that a TOT model supported by augmented data can be used to produce continuous estimates of take-over times without delay.
arXiv Detail & Related papers (2021-07-27T16:39:50Z) - Towards Automated Neural Interaction Discovery for Click-Through Rate
Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems.
We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.