Is BERTopic Better than PLSA for Extracting Key Topics in Aviation Safety Reports?
- URL: http://arxiv.org/abs/2506.06328v1
- Date: Fri, 30 May 2025 19:27:11 GMT
- Title: Is BERTopic Better than PLSA for Extracting Key Topics in Aviation Safety Reports?
- Authors: Aziida Nanyonga, Joiner Keith, Turhan Ugur, Wild Graham,
- Abstract summary: This study compares the effectiveness of BERTopic and Probabilistic Latent Semantic Analysis (PLSA) in extracting meaningful topics from aviation safety reports.<n>Using a dataset of over 36,000 National Transportation Safety Board (NTSB) reports from 2000 to 2020, BERTopic employed transformer based embeddings and hierarchical clustering.<n>Results showed that BERTopic outperformed PLSA in topic coherence, achieving a Cv score of 0.41 compared to PLSA 0.37, while also demonstrating superior interpretability as validated by aviation safety experts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study compares the effectiveness of BERTopic and Probabilistic Latent Semantic Analysis (PLSA) in extracting meaningful topics from aviation safety reports aiming to enhance the understanding of patterns in aviation incident data. Using a dataset of over 36,000 National Transportation Safety Board (NTSB) reports from 2000 to 2020, BERTopic employed transformer based embeddings and hierarchical clustering, while PLSA utilized probabilistic modelling through the Expectation-Maximization (EM) algorithm. Results showed that BERTopic outperformed PLSA in topic coherence, achieving a Cv score of 0.41 compared to PLSA 0.37, while also demonstrating superior interpretability as validated by aviation safety experts. These findings underscore the advantages of modern transformer based approaches in analyzing complex aviation datasets, paving the way for enhanced insights and informed decision-making in aviation safety. Future work will explore hybrid models, multilingual datasets, and advanced clustering techniques to further improve topic modelling in this domain.
Related papers
- Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision [46.87579355047397]
This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels.<n>Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework.
arXiv Detail & Related papers (2025-07-28T16:38:06Z) - Utilizing AI for Aviation Post-Accident Analysis Classification [0.0]
The volume of textual data available in aviation safety reports presents a challenge for timely and accurate analysis.<n>This paper examines how Artificial Intelligence (AI) and, specifically, Natural Language Processing (NLP) can automate the process of extracting valuable insights from this data.<n>The findings demonstrate that both NLP and deep learning, as well as TM, can significantly improve the efficiency and accuracy of aviation safety analysis.
arXiv Detail & Related papers (2025-05-30T19:15:04Z) - Deep Self-Supervised Disturbance Mapping with the OPERA Sentinel-1 Radiometric Terrain Corrected SAR Backscatter Product [41.94295877935867]
Mapping land surface disturbances supports disaster response, resource and ecosystem management, and climate adaptation efforts.<n> Synthetic aperture radar (SAR) is an invaluable tool for disturbance mapping, providing consistent time-series images of the ground regardless of weather or illumination conditions.<n>NASA's Observational Products for End-Users from Remote Sensing Analysis (OPERA) project released the near-global Radiometric Terrain Corrected SAR backscatter from Sentinel-1 (RTC-S1) dataset in October 2023.<n>In this work, we utilize this new dataset to systematically analyze land surface disturbances.
arXiv Detail & Related papers (2025-01-15T20:24:18Z) - Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques [0.0]
This study applies advanced natural language processing (NLP) techniques to the National Transportation Safety Board (NTSB) dataset.<n>Main objectives are identifying latent themes, exploring semantic relationships, assessing probabilistic connections, and cluster incidents based on shared characteristics.<n> Comparative analysis reveals that LDA performed best with a coherence value of 0.597, pLSA of 0.583, LSA of 0.542, and NMF of 0.437.
arXiv Detail & Related papers (2025-01-14T08:23:15Z) - Analyzing Aviation Safety Narratives with LDA, NMF and PLSA: A Case Study Using Socrata Datasets [0.0]
This study explores the application of topic modelling techniques on the Socrata dataset spanning from 1908 to 2009.<n>The analysis identified key themes such as pilot error, mechanical failure, weather conditions, and training deficiencies.<n>Future directions include integrating additional contextual variables, leveraging neural topic models, and enhancing aviation safety protocols.
arXiv Detail & Related papers (2025-01-03T08:14:39Z) - Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing [0.0]
This paper explores the application of four prominent topic modelling techniques, namely Probabilistic Latent Semantic Analysis (pLSA), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF)<n>The study examines each technique's ability to unveil latent thematic structures within the data, providing safety professionals with a systematic approach to gain actionable insights.
arXiv Detail & Related papers (2025-01-02T12:21:07Z) - Improved Anomaly Detection through Conditional Latent Space VAE Ensembles [49.1574468325115]
Conditional Latent space Variational Autoencoder (CL-VAE) improved pre-processing for anomaly detection on data with known inlier classes and unknown outlier classes.
Model shows increased accuracy in anomaly detection, achieving an AUC of 97.4% on the MNIST dataset.
In addition, the CL-VAE shows increased benefits from ensembling, a more interpretable latent space, and an increased ability to learn patterns in complex data with limited model sizes.
arXiv Detail & Related papers (2024-10-16T07:48:53Z) - On the Generalization Properties of Deep Learning for Aircraft Fuel Flow Estimation Models [2.7336487680215815]
This paper investigates the generalization capabilities of deep learning models in predicting fuel consumption.
We propose a novel methodology that integrates neural network architectures with domain generalization techniques.
For previously unseen aircraft types, the introduction of noise into aircraft and engine parameters improved model generalization.
arXiv Detail & Related papers (2024-10-10T08:34:19Z) - Machine Learning for Pre/Post Flight UAV Rotor Defect Detection Using Vibration Analysis [54.550658461477106]
Unmanned Aerial Vehicles (UAVs) will be critical infrastructural components of future smart cities.
In order to operate efficiently, UAV reliability must be ensured by constant monitoring for faults and failures.
This paper leverages signal processing and Machine Learning methods to analyze the data of a comprehensive vibrational analysis to determine the presence of rotor blade defects.
arXiv Detail & Related papers (2024-04-24T13:50:27Z) - Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures [39.89295870460643]
We propose a simple probabilistic model that can learn variability from procedural data and flight tracks collected from radar surveillance data.<n>We generate synthetic trajectories by sampling a series of deviations from the Gaussian mixture model and reconstructing the aircraft trajectory.<n>We demonstrate the proposed models on the arrival tracks and procedures of the John F. Kennedy International Airport.
arXiv Detail & Related papers (2023-03-17T13:58:06Z) - Wireless-Enabled Asynchronous Federated Fourier Neural Network for
Turbulence Prediction in Urban Air Mobility (UAM) [101.80862265018033]
Urban air mobility (UAM) has been proposed in which vertical takeoff and landing (VTOL) aircraft are used to provide a ride-hailing service.
In UAM, aircraft can operate in designated air spaces known as corridors, that link the aerodromes.
A reliable communication network between GBSs and aircraft enables UAM to adequately utilize the airspace.
arXiv Detail & Related papers (2021-12-26T14:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.