Multi-modal Machine Learning for Vehicle Rating Predictions Using Image,
Text, and Parametric Data
- URL: http://arxiv.org/abs/2305.15218v2
- Date: Sat, 27 May 2023 10:20:16 GMT
- Title: Multi-modal Machine Learning for Vehicle Rating Predictions Using Image,
Text, and Parametric Data
- Authors: Hanqi Su, Binyang Song and Faez Ahmed
- Abstract summary: We propose a multi-modal learning model for accurate vehicle rating predictions.
The model simultaneously learns features from the parametric specifications, text descriptions, and images of vehicles.
We find that the multi-modal model's explanatory power is 4% - 12% higher than that of the unimodal models.
- Score: 3.463438487417909
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate vehicle rating prediction can facilitate designing and configuring
good vehicles. This prediction allows vehicle designers and manufacturers to
optimize and improve their designs in a timely manner, enhance their product
performance, and effectively attract consumers. However, most of the existing
data-driven methods rely on data from a single mode, e.g., text, image, or
parametric data, which results in a limited and incomplete exploration of the
available information. These methods lack comprehensive analyses and
exploration of data from multiple modes, which probably leads to inaccurate
conclusions and hinders progress in this field. To overcome this limitation, we
propose a multi-modal learning model for more comprehensive and accurate
vehicle rating predictions. Specifically, the model simultaneously learns
features from the parametric specifications, text descriptions, and images of
vehicles to predict five vehicle rating scores, including the total score,
critics score, performance score, safety score, and interior score. We compare
the multi-modal learning model to the corresponding unimodal models and find
that the multi-modal model's explanatory power is 4% - 12% higher than that of
the unimodal models. On this basis, we conduct sensitivity analyses using SHAP
to interpret our model and provide design and optimization directions to
designers and manufacturers. Our study underscores the importance of the
data-driven multi-modal learning approach for vehicle design, evaluation, and
optimization. We have made the code publicly available at
http://decode.mit.edu/projects/vehicleratings/.
Related papers
- MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower.
We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events.
We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z) - Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual words, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary.
We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling [20.784555362703294]
We establish a public benchmark dataset for car-following behavior modeling.
The benchmark consists of more than 80K car-following events extracted from five public driving datasets.
Results show that the deep deterministic policy gradient (DDPG) based model performs competitively with a lower MSE for spacing.
arXiv Detail & Related papers (2023-05-25T08:59:26Z) - IDM-Follower: A Model-Informed Deep Learning Method for Long-Sequence
Car-Following Trajectory Prediction [24.94160059351764]
Most car-following models are generative and only consider the inputs of the speed, position, and acceleration of the last time step.
We implement a novel structure with two independent encoders and a self-attention decoder that could sequentially predict the following trajectories.
Numerical experiments with multiple settings on simulation and NGSIM datasets show that the IDM-Follower can improve the prediction performance.
arXiv Detail & Related papers (2022-10-20T02:24:27Z) - On the Choice of Data for Efficient Training and Validation of
End-to-End Driving Models [32.381828309166195]
We investigate the influence of several data design choices regarding training and validation of deep driving models trainable in an end-to-end fashion.
We show by correlation analysis, which validation design enables the driving performance measured during validation to generalize to unknown test environments.
arXiv Detail & Related papers (2022-06-01T16:25:28Z) - Transforming Model Prediction for Tracking [109.08417327309937]
Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models.
We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets.
Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset.
arXiv Detail & Related papers (2022-03-21T17:59:40Z) - CRAT-Pred: Vehicle Trajectory Prediction with Crystal Graph
Convolutional Neural Networks and Multi-Head Self-Attention [10.83642398981694]
CRAT-Pred is a trajectory prediction model that does not rely on map information.
The model achieves state-of-the-art performance with a significantly lower number of model parameters.
In addition to that, we quantitatively show that the self-attention mechanism is able to learn social interactions between vehicles, with the weights representing a measurable interaction score.
arXiv Detail & Related papers (2022-02-09T14:36:36Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - The Importance of Balanced Data Sets: Analyzing a Vehicle Trajectory
Prediction Model based on Neural Networks and Distributed Representations [0.0]
We investigate the composition of training data in vehicle trajectory prediction.
We show that the models employing our semantic vector representation outperform the numerical model when trained on an adequate data set.
arXiv Detail & Related papers (2020-09-30T20:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.