ARTÍCULO
TITULO

A Model for Enhancing Unstructured Big Data Warehouse Execution Time

Marwa Salah Farhan    
Amira Youssef and Laila Abdelhamid    

Resumen

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract?Transform?Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract?Clean?Load?Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Palabras claves

 Artículos similares

       
 
Chen Zhang, Celimuge Wu, Min Lin, Yangfei Lin and William Liu    
In the advanced 5G and beyond networks, multi-access edge computing (MEC) is increasingly recognized as a promising technology, offering the dual advantages of reducing energy utilization in cloud data centers while catering to the demands for reliabilit... ver más
Revista: Future Internet

 
Qingyan Wang, Longzhi Sun and Xuan Yang    
Rice yield is essential to global food security under increasingly frequent and severe climate change events. Spatial analysis of rice yields becomes more critical for regional action to ensure yields and reduce climate impacts. However, the understandin... ver más

 
Lilu Zhu, Yang Wang, Yunbo Kong, Yanfeng Hu and Kai Huang    
The integration of geospatial-analysis models is crucial for simulating complex geographic processes and phenomena. However, compared to non-geospatial models and traditional geospatial models, geospatial-analysis models face more challenges owing to ext... ver más

 
Binita Kusum Dhamala, Babu R. Dawadi, Pietro Manzoni and Baikuntha Kumar Acharya    
Graph representation is recognized as an efficient method for modeling networks, precisely illustrating intricate, dynamic interactions within various entities of networks by representing entities as nodes and their relationships as edges. Leveraging the... ver más
Revista: Future Internet

 
Pradeep Kumar, Guo-Liang Shih, Bo-Lin Guo, Siva Kumar Nagi, Yibeltal Chanie Manie, Cheng-Kai Yao, Michael Augustine Arockiyadoss and Peng-Chun Peng    
Violent attacks have been one of the hot issues in recent years. In the presence of closed-circuit televisions (CCTVs) in smart cities, there is an emerging challenge in apprehending criminals, leading to a need for innovative solutions. In this paper, t... ver más
Revista: Future Internet