Inicio  /  Aerospace  /  Vol: 7 Par: 10 (2020)  /  Artículo
ARTÍCULO
TITULO

Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives

Rodrigo L. Rose    
Tejas G. Puranik and Dimitri N. Mavris    

Resumen

The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown in popularity. Data-driven techniques offer efficient and repeatable exploration of patterns and anomalies in large datasets. Text-based flight safety data presents a unique challenge in its subjectivity, and relies on natural language processing tools to extract underlying trends from narratives. In this paper, a methodology is presented for the analysis of aviation safety narratives based on text-based accounts of in-flight events and categorical metadata parameters which accompany them. An extensive pre-processing routine is presented, including a comparison between numeric models of textual representation for the purposes of document classification. A framework for categorizing and visualizing narratives is presented through a combination of k-means clustering and 2-D mapping with t-Distributed Stochastic Neighbor Embedding (t-SNE). A cluster post-processing routine is developed for identifying driving factors in each cluster and building a hierarchical structure of cluster and sub-cluster labels. The Aviation Safety Reporting System (ASRS), which includes over a million de-identified voluntarily submitted reports describing aviation safety incidents for commercial flights, is analyzed as a case study for the methodology. The method results in the identification of 10 major clusters and a total of 31 sub-clusters. The identified groupings are post-processed through metadata-based statistical analysis of the learned clusters. The developed method shows promise in uncovering trends from clusters that are not evident in existing anomaly labels in the data and offers a new tool for obtaining insights from text-based safety data that complement existing approaches.

 Artículos similares

       
 
Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov    
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow... ver más
Revista: Information

 
Carlo Galli, Nikolaos Donos and Elena Calciolari    
Systematic reviews are cumbersome yet essential to the epistemic process of medical science. Finding significant reports, however, is a daunting task because the sheer volume of published literature makes the manual screening of databases time-consuming.... ver más
Revista: Information

 
Giuliana Favara, Martina Barchitta, Andrea Maugeri, Roberta Magnano San Lio and Antonella Agodi    
Background: Natural language processing, such as ChatGPT, demonstrates growing potential across numerous research scenarios, also raising interest in its applications in public health and epidemiology. Here, we applied a bibliometric analysis for a syste... ver más
Revista: Informatics

 
Omiros Iatrellis, Nicholas Samaras, Konstantinos Kokkinos and Apostolis Xenakis    
Academic advising is often pivotal in shaping students? educational experiences and choices. This study leverages natural language processing to quantitatively evaluate reviews of academic advisors, aiming to provide actionable insights on key feedback p... ver más

 
Weijun Li, Jintong Liu, Yuxiao Gao, Xinyong Zhang and Jianlai Gu    
The task of named entity recognition (NER) is to identify entities in the text and predict their categories. In real-life scenarios, the context of the text is often complex, and there may exist nested entities within an entity. This kind of entity is ca... ver más