REVISTA
Big Data and Cognitive Computing

TODAS

Inicio / Big Data and Cognitive Computing / Vol: 5 Par: 4 (2021) / Art�culo

ART�CULO

TITULO

An Enhanced Parallelisation Model for Performance Prediction of Apache Spark on a Multinode Hadoop Cluster

Nasim Ahmed

Andre L. C. Barczak

Mohammad A. Rashid and Teo Susnjak

Resumen

Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system is performance prediction. Spark has more than 150 configurable parameters, and configuration of so many parameters is challenging task when determining the suitable parameters for the system. In this paper, we proposed two distinct parallelisation models for performance prediction. Our insight is that each node in a Hadoop cluster can communicate with identical nodes, and a certain function of the non-parallelisable runtime can be estimated accordingly. Both models use simple equations that allows us to predict the runtime when the size of the job and the number of executables are known. The proposed models were evaluated based on five HiBench workloads, Kmeans, PageRank, Graph (NWeight), SVM, and WordCount. The workload?s empirical data were fitted with one of the two models meeting the accuracy requirements. Finally, the experimental findings show that the model can be a handy and helpful tool for scheduling and planning system deployment.

Palabras claves

big data processing - Apache Spark - execution time prediction - performance prediction - modelling

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 5 Parte: 4 (2021)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Water
Buildings
IoT

DOI

https://doi.org/10.3390/bdcc5040065

Art�culos similares

Spatio-Temporal Groundwater Drought Monitoring Using Multi-Satellite Data Based on an Artificial Neural Network

Acceso

Jae Young Seo and Sang-Il Lee

Drought is a complex phenomenon caused by lack of precipitation that affects water resources and human society. Groundwater drought is difficult to assess due to its complexity and the lack of spatio-temporal groundwater observations. In this study, we p... ver m�s

Revista: Water

Can the Quality of the Potential Flood Risk Maps be Evaluated? A Case Study of the Social Risks of Floods in Central Spain

Acceso

Julio Garrote, Ignacio Guti�rrez-P�rez and Andr�s D�ez-Herrero

Calibration and validation of flood risk maps at a national or a supra-national level remains a problematic aspect due to the limited information available to carry out these tasks. However, this validation is essential to define the representativeness o... ver m�s

Revista: Water

Sediment Identification Using Machine Learning Classifiers in a Mixed-Texture Dredge Pit of Louisiana Shelf for Coastal Restoration

Acceso

Haoran Liu, Kehui Xu, Bin Li, Ya Han and Guandong Li

Machine learning classifiers have been rarely used for the identification of seafloor sediment types in the rapidly changing dredge pits for coastal restoration. Our study uses multiple machine learning classifiers to identify the sediment types of the C... ver m�s

Revista: Water

Seasonal and Scale Effects of Anthropogenic Pressures on Water Quality and Ecological Integrity: A Study in the Sabor River Basin (NE Portugal) Using Partial Least Squares-Path Modeling

Acceso

Ant�nio Carlos Pinheiro Fernandes, Lu�s Filipe Sanches Fernandes, Daniela Patr�cia Salgado Ter�ncio, Rui Manuel Vitor Cortes and Fernando Ant�nio Leal Pacheco

Interactions between pollution sources, water contamination, and ecological integrity are complex phenomena and hard to access. To comprehend this subject of study, it is crucial to use advanced statistical tools, which can unveil cause-effect relationsh... ver m�s

Revista: Water

Monthly Streamflow Prediction of the Source Region of the Yellow River Based on Long Short-Term Memory Considering Different Lagged Months

Acceso

Haibo Chu, Zhuoqi Wang and Chong Nie

Accurate and reliable monthly streamflow prediction plays a crucial role in the scientific allocation and efficient utilization of water resources. In this paper, we proposed a prediction framework that integrates the input variable selection method and ... ver m�s

Revista: Water

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles