Redirigiendo al acceso original de articulo en 20 segundos...
ARTÍCULO
TITULO

A Model for Enhancing Unstructured Big Data Warehouse Execution Time

Marwa Salah Farhan    
Amira Youssef and Laila Abdelhamid    

Resumen

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract?Transform?Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract?Clean?Load?Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Palabras claves

 Artículos similares

       
 
Chen Zhang, Celimuge Wu, Min Lin, Yangfei Lin and William Liu    
In the advanced 5G and beyond networks, multi-access edge computing (MEC) is increasingly recognized as a promising technology, offering the dual advantages of reducing energy utilization in cloud data centers while catering to the demands for reliabilit... ver más
Revista: Future Internet

 
Binita Kusum Dhamala, Babu R. Dawadi, Pietro Manzoni and Baikuntha Kumar Acharya    
Graph representation is recognized as an efficient method for modeling networks, precisely illustrating intricate, dynamic interactions within various entities of networks by representing entities as nodes and their relationships as edges. Leveraging the... ver más
Revista: Future Internet

 
Pradeep Kumar, Guo-Liang Shih, Bo-Lin Guo, Siva Kumar Nagi, Yibeltal Chanie Manie, Cheng-Kai Yao, Michael Augustine Arockiyadoss and Peng-Chun Peng    
Violent attacks have been one of the hot issues in recent years. In the presence of closed-circuit televisions (CCTVs) in smart cities, there is an emerging challenge in apprehending criminals, leading to a need for innovative solutions. In this paper, t... ver más
Revista: Future Internet

 
Haibo Chu, Zhuoqi Wang and Chong Nie    
Accurate and reliable monthly streamflow prediction plays a crucial role in the scientific allocation and efficient utilization of water resources. In this paper, we proposed a prediction framework that integrates the input variable selection method and ... ver más
Revista: Water

 
Xie Lian, Xiaolong Hu, Liangsheng Shi, Jinhua Shao, Jiang Bian and Yuanlai Cui    
The parameters of the GR4J-CemaNeige coupling model (GR4neige) are typically treated as constants. However, the maximum capacity of the production store (parX1) exhibits time-varying characteristics due to climate variability and vegetation coverage chan... ver más
Revista: Water