Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques
- URL: http://arxiv.org/abs/2501.07924v1
- Date: Tue, 14 Jan 2025 08:23:15 GMT
- Title: Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques
- Authors: Aziida Nanyonga, Hassan Wasswa, Ugur Turhan, Keith Joiner, Graham Wild,
- Abstract summary: This study applies advanced natural language processing (NLP) techniques to the National Transportation Safety Board (NTSB) dataset.
Main objectives are identifying latent themes, exploring semantic relationships, assessing probabilistic connections, and cluster incidents based on shared characteristics.
Comparative analysis reveals that LDA performed best with a coherence value of 0.597, pLSA of 0.583, LSA of 0.542, and NMF of 0.437.
- Score: 0.0
- License:
- Abstract: Aviation safety is a global concern, requiring detailed investigations into incidents to understand contributing factors comprehensively. This study uses the National Transportation Safety Board (NTSB) dataset. It applies advanced natural language processing (NLP) techniques, including Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (pLSA), and K-means clustering. The main objectives are identifying latent themes, exploring semantic relationships, assessing probabilistic connections, and cluster incidents based on shared characteristics. This research contributes to aviation safety by providing insights into incident narratives and demonstrating the versatility of NLP and topic modelling techniques in extracting valuable information from complex datasets. The results, including topics identified from various techniques, provide an understanding of recurring themes. Comparative analysis reveals that LDA performed best with a coherence value of 0.597, pLSA of 0.583, LSA of 0.542, and NMF of 0.437. K-means clustering further reveals commonalities and unique insights into incident narratives. In conclusion, this study uncovers latent patterns and thematic structures within incident narratives, offering a comparative analysis of multiple-topic modelling techniques. Future research avenues include exploring temporal patterns, incorporating additional datasets, and developing predictive models for early identification of safety issues. This research lays the groundwork for enhancing the understanding and improvement of aviation safety by utilising the wealth of information embedded in incident narratives.
Related papers
- Analyzing Aviation Safety Narratives with LDA, NMF and PLSA: A Case Study Using Socrata Datasets [0.0]
This study explores the application of topic modelling techniques on the Socrata dataset spanning from 1908 to 2009.
The analysis identified key themes such as pilot error, mechanical failure, weather conditions, and training deficiencies.
Future directions include integrating additional contextual variables, leveraging neural topic models, and enhancing aviation safety protocols.
arXiv Detail & Related papers (2025-01-03T08:14:39Z) - Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing [0.0]
This paper explores the application of four prominent topic modelling techniques, namely Probabilistic Latent Semantic Analysis (pLSA), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF)
The study examines each technique's ability to unveil latent thematic structures within the data, providing safety professionals with a systematic approach to gain actionable insights.
arXiv Detail & Related papers (2025-01-02T12:21:07Z) - Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis [0.40964539027092917]
This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources.
Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups.
The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance.
arXiv Detail & Related papers (2024-12-06T20:47:13Z) - Explainability of Point Cloud Neural Networks Using SMILE: Statistical Model-Agnostic Interpretability with Local Explanations [0.0]
This study explores the implementation of SMILE, a novel explainability method originally designed for deep neural networks, on point cloud-based models.
The approach demonstrates superior performance in terms of fidelity loss, R2 scores, and robustness across various kernel widths, perturbation numbers, and clustering configurations.
The study further identifies dataset biases in the classification of the 'person' category, emphasizing the necessity for more comprehensive datasets in safety-critical applications.
arXiv Detail & Related papers (2024-10-20T12:13:59Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Topic Modeling Analysis of Aviation Accident Reports: A Comparative
Study between LDA and NMF Models [0.0]
This paper compares two prominent topic modeling techniques, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF)
LDA demonstrates higher topic coherence, indicating stronger semantic relevance among words within topics.
NMF excelled in producing distinct and granular topics, enabling a more focused analysis of specific aspects of aviation accidents.
arXiv Detail & Related papers (2024-03-04T01:41:07Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain.
We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work.
We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.