Inicio  /  Information  /  Vol: 11 Par: 4 (2020)  /  Artículo
ARTÍCULO
TITULO

Applying the ETL Process to Blockchain Data. Prospect and Findings

Roberta Galici    
Laura Ordile    
Michele Marchesi    
Andrea Pinna and Roberto Tonelli    

Resumen

We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction. We describe the implementation of address clustering algorithms specific for the Bitcoin blockchain and the process to collect and transform data and to load them in the database. To balance the input data rate with the elaboration time, we manage blockchain data according to the lambda architecture. To evaluate our process, we first analyzed the performances in terms of scalability, and then we checked its usability by analyzing loaded data. Finally, we present the results of a toy analysis, which provides some findings about blockchain data, focusing on a comparison between the statistics of the last year of transactions, and previous results of historical blockchain data found in the literature. The ETL process we realized to analyze blockchain data is proven to be able to perform a reliable and scalable data acquisition process, whose result makes stored data available for further analysis and business.

 Artículos similares

       
 
Jintao Zhao, Wenlei Sun, Cheng Lu, Xuedong Zhang, Lixin Wang and Dajiang Wang    
Wire and cable are important industrial products involving the national economy and people?s livelihood, which are hailed as the ?blood vessel? and ?nerve? of the national economy, providing the basic guarantee for the normal operation of modern economy ... ver más
Revista: Applied Sciences

 
Saad Said Alqahtany and Toqeer Ali Syed    
In the domain of computer forensics, ensuring the integrity of operations like preservation, acquisition, analysis, and documentation is critical. Discrepancies in these processes can compromise evidence and lead to potential miscarriages of justice. To ... ver más
Revista: Information

 
Moutaz Alazab and Salah Alhyari    
Industry 4.0 has revolutionized manufacturing processes and facilities through the creation of smart and sustainable production facilities. Blockchain technology (BCT) has emerged as an invaluable asset within Industrial Revolution 4.0 (IR4.0), offering ... ver más
Revista: Information

 
Xinyu Liu, Shan Ji, Xiaowan Wang, Liang Liu and Yongjun Ren    
Blockchain, with its characteristics of non-tamperability and decentralization, has had a profound impact on various fields of society and has set off a boom in the research and application of blockchain technology. However, blockchain technology faces t... ver más
Revista: Information

 
Alvina Ekua Ntefua Saah, Jurng-Jae Yee and Jae-Ho Choi    
The construction industry, characterized by its intricate network of stakeholders and diverse workforce, grapples with the challenge of managing information effectively. This study delves into this issue, recognizing the universal importance of safeguard... ver más
Revista: Applied Sciences