ARTÍCULO
TITULO

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data

Giuseppe Di Modica and Orazio Tomarchio    

Resumen

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.

 Artículos similares

       
 
Kenneth David Strang    
A critical worldwide problem is that ransomware cyberattacks can be costly to organizations. Moreover, accidental employee cybercrime risk can be challenging to prevent, even by leveraging advanced computer science techniques. This exploratory project us... ver más

 
Hanyue Xu, Kah Phooi Seng, Jeremy Smith and Li Minn Ang    
In the context of smart cities, the integration of artificial intelligence (AI) and the Internet of Things (IoT) has led to the proliferation of AIoT systems, which handle vast amounts of data to enhance urban infrastructure and services. However, the co... ver más
Revista: Future Internet

 
Jose A. Montenegro and Antonio Muñoz    
In this manuscript, we present EventGeoScout, an innovative framework for collaborative geographic information management, tailored to meet the needs of the dynamically changing landscape of geographic data integration and quality enhancement. EventGeoSc... ver más

 
Ali Eghmazi, Mohammadhossein Ataei, René Jr Landry and Guy Chevrette    
The Internet of Things (IoT) is a technology that can connect billions of devices or ?things? to other devices (machine to machine) or even to people via an existing infrastructure. IoT applications in real-world scenarios include smart cities, smart hou... ver más
Revista: IoT

 
Qian Qu, Mohsen Hatami, Ronghua Xu, Deeraj Nagothu, Yu Chen, Xiaohua Li, Erik Blasch, Erika Ardiles-Cruz and Genshe Chen    
Over the past decade, there has been a remarkable acceleration in the evolution of smart cities and intelligent spaces, driven by breakthroughs in technologies such as the Internet of Things (IoT), edge?fog?cloud computing, and machine learning (ML)/arti... ver más
Revista: Future Internet