Inicio  /  Applied Sciences  /  Vol: 13 Par: 19 (2023)  /  Artículo
ARTÍCULO
TITULO

Data Quality Analysis and Improvement: A Case Study of a Bus Transportation System

Shuyan Si    
Wen Xiong and Xingliang Che    

Resumen

Due to the rapid development of the mobile Internet and the Internet of Things, the volume of generated data keeps growing. The topic of data quality has gained increasing attention recently. Numerous studies have explored various data quality (DQ) problems across several fields, with corresponding effective data-cleaning strategies being researched. This paper begins with a comprehensive and systematic review of studies related to DQ. On the one hand, we classify these DQ-related studies into six types: redundant data, missing data, noisy data, erroneous data, conflicting data, and sparse data. On the other hand, we discuss the corresponding data-cleaning strategies for each DQ type. Secondly, we examine DQ issues and potential solutions for a public bus transportation system, utilizing a real-world traffic big data platform. Finally, we provide two representative examples, noise filtering and filling missing values, to demonstrate the DQ improvement practice. The experimental results show that: (1) The GPS noise filtering solution we proposed surpasses the baseline and achieves an accuracy of 97%; (2) The multi-source data fusion method can achieve a 100% missing repair rate (MRR) for bus arrival and departure. The average relative error (ARE) of bus arrival and departure times at stations is less than 1%, and the correlation coefficient (R) is also close to 1. Our research can offer guidance and lessons for enhancing data governance and quality improvement in the bus transportation system.

 Artículos similares

       
 
Zuhier Alakayleh, Xing Fang and T. Prabhakar Clement    
This study aims at furthering our understanding of the Modified Philip?Dunne Infiltrometer (MPDI), which is used to determine the saturated hydraulic conductivity Ks and the Green?Ampt suction head ? at the wetting front. We have developed a forward-mode... ver más
Revista: Water

 
Carlo Ciaponi, Enrico Creaco, Armando Di Nardo, Michele Di Natale, Carlo Giudicianni, Dino Musmarra and Giovanni Francesco Santonastaso    
This paper proposes a combined management strategy for monitoring water distribution networks (WDNs). This strategy is based on the application of water network partitioning (WNP) for the creation of district metered areas (DMAs) and on the installation ... ver más
Revista: Water

 
António Carlos Pinheiro Fernandes, Luís Filipe Sanches Fernandes, Daniela Patrícia Salgado Terêncio, Rui Manuel Vitor Cortes and Fernando António Leal Pacheco    
Interactions between pollution sources, water contamination, and ecological integrity are complex phenomena and hard to access. To comprehend this subject of study, it is crucial to use advanced statistical tools, which can unveil cause-effect relationsh... ver más
Revista: Water

 
Haoran Liu, Kehui Xu, Bin Li, Ya Han and Guandong Li    
Machine learning classifiers have been rarely used for the identification of seafloor sediment types in the rapidly changing dredge pits for coastal restoration. Our study uses multiple machine learning classifiers to identify the sediment types of the C... ver más
Revista: Water

 
Fhrizz S. De Jesus, Hazel Jade E. Villamar, Ramezesh E. Dionisio     Pág. 40 - 53
AbstractThe COVID-19 pandemic has expedited the transition towards a more technologically advanced world, with lasting repercussions on online buying habits. Due to constraints on face-to-face communication, the consumer has migrated from in-person to on... ver más