Related papers: The AI Mechanic: Acoustic Vehicle Characterization Neural Networks

The AI Mechanic: Acoustic Vehicle Characterization Neural Networks

URL: http://arxiv.org/abs/2205.09667v1
Date: Thu, 19 May 2022 16:29:26 GMT
Title: The AI Mechanic: Acoustic Vehicle Characterization Neural Networks
Authors: Adam M. Terwilliger, Joshua E. Siegel
Abstract summary: We introduce the AI mechanic, an acoustic vehicle characterization deep learning system, using sound captured from mobile devices. We build a convolutional neural network that predicts and cascades vehicle attributes to enhance fault detection. Our cascading architecture additionally achieved 93.6% validation and 86.8% test set accuracy on misfire fault prediction, demonstrating margins of 16.4% / 7.8% and 4.2% / 1.5% improvement over na"ive and parallel baselines.
Score: 1.8275108630751837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In a world increasingly dependent on road-based transportation, it is essential to understand vehicles. We introduce the AI mechanic, an acoustic vehicle characterization deep learning system, as an integrated approach using sound captured from mobile devices to enhance transparency and understanding of vehicles and their condition for non-expert users. We develop and implement novel cascading architectures for vehicle understanding, which we define as sequential, conditional, multi-level networks that process raw audio to extract highly-granular insights. To showcase the viability of cascading architectures, we build a multi-task convolutional neural network that predicts and cascades vehicle attributes to enhance fault detection. We train and test these models on a synthesized dataset reflecting more than 40 hours of augmented audio and achieve >92% validation set accuracy on attributes (fuel type, engine configuration, cylinder count and aspiration type). Our cascading architecture additionally achieved 93.6% validation and 86.8% test set accuracy on misfire fault prediction, demonstrating margins of 16.4% / 7.8% and 4.2% / 1.5% improvement over na\"ive and parallel baselines. We explore experimental studies focused on acoustic features, data augmentation, feature fusion, and data reliability. Finally, we conclude with a discussion of broader implications, future directions, and application areas for this work.

Related papers

Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision [2.0720154517628417]
We propose a novel framework combining open-vocabulary detection and cross-modal learning.<n>For traffic sign detection, our NanoVerse YOLO model integrates a vision-language path aggregation network (RepVL-PAN) and an SPD-Conv module.<n>For traffic sign classification, we designed a Traffic Sign Recognition Multimodal Contrastive Learning model (TSR-MCL)<n>On the TT100K dataset, our method achieves a state-of-the-art 78.4% mAP in the long-tail detection task for all-class recognition.
arXiv Detail & Related papers (2025-07-31T08:23:30Z)
Learning to Drive by Imitating Surrounding Vehicles [0.8902959815221526]
We study a data augmentation strategy that leverages the observed trajectories of nearby vehicles as additional demonstrations.<n>We introduce a simple vehicle-selection sampling and filtering strategy that prioritizes informative and diverse driving behaviors.<n>Specifically, the approach reduces collision rates and improves safety metrics compared to the baseline.
arXiv Detail & Related papers (2025-03-08T00:40:47Z)
FollowGen: A Scaled Noise Conditional Diffusion Model for Car-Following Trajectory Prediction [9.2729178775419]
This study introduces a scaled noise conditional diffusion model for car-following trajectory prediction. It integrates detailed inter-vehicular interactions and car-following dynamics into a generative framework, improving the accuracy and plausibility of predicted trajectories. Experimental results on diverse real-world driving scenarios demonstrate the state-of-the-art performance and robustness of the proposed method.
arXiv Detail & Related papers (2024-11-23T23:13:45Z)
Transforming In-Vehicle Network Intrusion Detection: VAE-based Knowledge Distillation Meets Explainable AI [0.0]
This paper introduces an advanced intrusion detection system (IDS) called KD-XVAE that uses a Variational Autoencoder (VAE)-based knowledge distillation approach. Our model significantly reduces complexity, operating with just 1669 parameters and achieving an inference time of 0.3 ms per batch.
arXiv Detail & Related papers (2024-10-11T17:57:16Z)
Advancements in Road Lane Mapping: Comparative Fine-Tuning Analysis of Deep Learning-based Semantic Segmentation Methods Using Aerial Imagery [16.522544814241495]
This research addresses the need for high-definition (HD) maps for autonomous vehicles (AVs) Earth observation data offers valuable resources for map creation, but specialized models for road lane extraction are still underdeveloped in remote sensing. In this study, we compare twelve foundational deep learning-based semantic segmentation models for road lane marking extraction from high-definition remote sensing images.
arXiv Detail & Related papers (2024-10-08T06:24:15Z)
Optimizing LaneSegNet for Real-Time Lane Topology Prediction in Autonomous Vehicles [0.41942958779358663]
LaneSegNet is a new approach to lane topology prediction which integrates topological information with lane-line data. This study explores optimizations to the LaneSegNet architecture through feature extractor modification and transformer encoder-decoder stack modification. Our implementation, trained on a single NVIDIA Tesla A100 GPU, found that a 2:4 ratio reduced training time by 22.3% with only a 7.1% drop in mean average precision. A 4:8 ratio increased training time by only 11.1% but improved mean average precision by a significant 23.7%.
arXiv Detail & Related papers (2024-06-22T21:49:12Z)
Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z)
Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z)
Attention-based Neural Network for Driving Environment Complexity Perception [123.93460670568554]
This paper proposes a novel attention-based neural network model to predict the complexity level of the surrounding driving environment. It consists of a Yolo-v3 object detection algorithm, a heat map generation algorithm, CNN-based feature extractors, and attention-based feature extractors. The proposed attention-based network achieves 91.22% average classification accuracy to classify the surrounding environment complexity.
arXiv Detail & Related papers (2021-06-21T17:27:11Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z)
Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions [1.4086978333609153]
We propose a novel data-driven approach that uses scene-graphs as intermediate representations. Our approach includes a Multi-Relation Graph Convolution Network, a Long-Short Term Memory Network, and attention layers for modeling the subjective risk of driving maneuvers. We show that our approach achieves a higher classification accuracy than the state-of-the-art approach on both large (96.4% vs. 91.2%) and small (91.8% vs. 71.2%) We also show that our model trained on a synthesized dataset achieves an average accuracy of 87.8% when tested on a real-world dataset.
arXiv Detail & Related papers (2020-08-31T07:41:27Z)
Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems. We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.