Related papers: Sequential Classification of Aviation Safety Occurrences with Natural Language Processing

Sequential Classification of Aviation Safety Occurrences with Natural Language Processing

URL: http://arxiv.org/abs/2501.06490v1
Date: Sat, 11 Jan 2025 09:23:55 GMT
Title: Sequential Classification of Aviation Safety Occurrences with Natural Language Processing
Authors: Aziida Nanyonga, Hassan Wasswa, Ugur Turhan, Oleksandra Molloy, Graham Wild,
Abstract summary: The ability to classify and categorise safety occurrences would help aviation industry stakeholders make informed safety-critical decisions.<n>The classification performance of various deep learning models was evaluated on a set of 27,000 safety occurrence reports from the NTSB.
Score: 14.379311972506791
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Safety is a critical aspect of the air transport system given even slight operational anomalies can result in serious consequences. To reduce the chances of aviation safety occurrences, accidents and incidents are reported to establish the root cause, propose safety recommendations etc. However, analysis narratives of the pre-accident events are presented using human-understandable, raw, unstructured, text that a computer system cannot understand. The ability to classify and categorise safety occurrences from their textual narratives would help aviation industry stakeholders make informed safety-critical decisions. To classify and categorise safety occurrences, we applied natural language processing (NLP) and AI (Artificial Intelligence) models to process text narratives. The study aimed to answer the question. How well can the damage level caused to the aircraft in a safety occurrence be inferred from the text narrative using natural language processing. The classification performance of various deep learning models including LSTM, BLSTM, GRU, sRNN, and combinations of these models including LSTM and GRU, BLSTM+GRU, sRNN and LSTM, sRNN and BLSTM, sRNN and GRU, sRNN and BLSTM and GRU, and sRNN and LSTM and GRU was evaluated on a set of 27,000 safety occurrence reports from the NTSB. The results of this study indicate that all models investigated performed competitively well recording an accuracy of over 87.9% which is well above the random guess of 25% for a four-class classification problem. Also, the models recorded high precision, recall, and F1 scores above 80%, 88%, and 85%, respectively. sRNN slightly outperformed other single models in terms of recall (90%) and accuracy (90%) while LSTM reported slightly better performance in terms of precision (87%).

Related papers

Phase of Flight Classification in Aviation Safety using LSTM, GRU, and BiLSTM: A Case Study with ASN Dataset [0.0]
The research aims to determine whether the phase of flight can be inferred from narratives of post-accident events using NLP techniques. The classification performance of various deep learning models was evaluated.
arXiv Detail & Related papers (2025-01-14T08:26:58Z)
Aviation Safety Enhancement via NLP & Deep Learning: Classifying Flight Phases in ATSB Safety Reports [0.0]
This study employs Natural Language Processing (NLP) and Deep Learning models, including LSTM, CNN, Bidirectional LSTM (BLSTM), and simple Recurrent Neural Networks (sRNN) to classify flight phases in safety reports from the Australian Transport Safety Bureau (ATSB) The models exhibited high accuracy, precision, recall, and F1 scores, with LSTM achieving the highest performance of 87%, 88%, 87%, and 88%, respectively.
arXiv Detail & Related papers (2025-01-14T08:18:41Z)
Natural Language Processing and Deep Learning Models to Classify Phase of Flight in Aviation Safety Occurrences [14.379311972506791]
Researchers applied natural language processing (NLP) and artificial intelligence (AI) models to process text narratives to classify the flight phases of safety occurrences.<n>The classification performance of two deep learning models, ResNet and sRNN was evaluated, using an initial dataset of 27,000 safety occurrence reports from the NTSB.
arXiv Detail & Related papers (2025-01-11T15:02:49Z)
Comparative Study of Deep Learning Architectures for Textual Damage Level Classification [0.0]
This study aims to leverage Natural Language Processing (NLP) and deep learning models to analyze unstructured text narratives.<n>Using LSTM, BLSTM, GRU, and sRNN deep learning models, we classify the aircraft damage level incurred during safety occurrences.<n>The sRNN model emerged as the top performer in terms of recall and accuracy, boasting a remarkable 89%.
arXiv Detail & Related papers (2025-01-03T08:23:29Z)
Classification of Operational Records in Aviation Using Deep Learning Approaches [0.0]
This study evaluates the performance of four different models for DP (deep learning) in a classification task involving Commercial, Military, and Private categories.<n>Among the models, BLSTM achieved the highest overall accuracy of 72%, demonstrating superior performance in stability and balanced classification.<n>CNN and sRNN exhibited lower accuracies of 67% and 69%, with significant misclassifications in the Private class.
arXiv Detail & Related papers (2025-01-02T12:12:02Z)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [75.85283891591678]
Artificial Intelligence (AI) is revolutionizing scientific research, yet its growing integration into laboratory environments presents critical safety challenges. Large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration. Such overreliance is especially hazardous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents. We propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive framework that evaluates LLMs and vision language models (VLMs) on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments.
arXiv Detail & Related papers (2024-10-18T05:21:05Z)
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection [92.38300626647342]
SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection.
arXiv Detail & Related papers (2024-10-09T22:24:22Z)
A Comparative Study of Hybrid Models in Health Misinformation Text Classification [0.43695508295565777]
This study evaluates the effectiveness of machine learning (ML) and deep learning (DL) models in detecting COVID-19-related misinformation on online social networks (OSNs) Our study concludes that DL and hybrid DL models are more effective than conventional ML algorithms for detecting COVID-19 misinformation on OSNs.
arXiv Detail & Related papers (2024-10-08T19:43:37Z)
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. We design a synthetic data generation framework that captures salient aspects of an unsafe input. Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z)
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z)
OR-Bench: An Over-Refusal Benchmark for Large Language Models [65.34666117785179]
Large Language Models (LLMs) require careful safety alignment to prevent malicious outputs. This study proposes a novel method for automatically generating large-scale sets of "seemingly toxic prompts" We then conduct a comprehensive study to measure the over-refusal of 25 popular LLMs across 8 model families.
arXiv Detail & Related papers (2024-05-31T15:44:33Z)
Predicting Overtakes in Trucks Using CAN Data [51.28632782308621]
We investigate the detection of truck overtakes from CAN data. Our analysis covers up to 10 seconds before the overtaking event. We observe that the prediction scores of the overtake class tend to increase as we approach the overtake trigger.
arXiv Detail & Related papers (2024-04-08T17:58:22Z)
A LSTM and Cost-Sensitive Learning-Based Real-Time Warning for Civil Aviation Over-limit [0.0]
A real-time warning model for civil aviation over-limit is proposed based on QAR data monitoring. The proposed model achieves an F1 score of 0.991 and an accuracy of 0.978, indicating its effectiveness in real-time warning of civil aviation over-limit.
arXiv Detail & Related papers (2023-05-08T10:56:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.