ARTÍCULO
TITULO

A Model for Enhancing Unstructured Big Data Warehouse Execution Time

Marwa Salah Farhan    
Amira Youssef and Laila Abdelhamid    

Resumen

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract?Transform?Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract?Clean?Load?Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Palabras claves

 Artículos similares

       
 
Yushan Li and Satoshi Fujita    
This paper proposes a novel event-driven architecture for enhancing edge-based vehicular systems within smart transportation. Leveraging the inherent real-time, scalable, and fault-tolerant nature of the Elixir language, we present an innovative architec... ver más
Revista: Future Internet

 
Binita Kusum Dhamala, Babu R. Dawadi, Pietro Manzoni and Baikuntha Kumar Acharya    
Graph representation is recognized as an efficient method for modeling networks, precisely illustrating intricate, dynamic interactions within various entities of networks by representing entities as nodes and their relationships as edges. Leveraging the... ver más
Revista: Future Internet

 
Abdullah F. Al-Aboosi, Aldo Jonathan Muñoz Vazquez, Fadhil Y. Al-Aboosi, Mahmoud El-Halwagi and Wei Zhan    
Accurate prediction of renewable energy output is essential for integrating sustainable energy sources into the grid, facilitating a transition towards a more resilient energy infrastructure. Novel applications of machine learning and artificial intellig... ver más

 
Jian Xu, Yujia Qian, Bingyue He, Huixuan Xiang, Ran Ling and Genyu Xu    
To effectively combat environmental challenges, it is necessary to evaluate urban residential building carbon emissions and implement energy-efficient, emission-reducing strategies. The lack of a specialized carbon emission monitoring system complicates ... ver más
Revista: Buildings

 
Mo Fan, Massoomeh Hedayati Marzbali, Aldrin Abdullah and Mohammad Javad Maghsoodi Tilaki    
Contemporary urban development places a critical emphasis on pedestrian environments, especially in historic cities like George Town, which is a UNESCO World Heritage Site in Malaysia. Although survey questionnaires effectively captured public perception... ver más
Revista: Urban Science