ARTÍCULO
TITULO

Experimental evaluation of the temporal efficiency of big data processing for specified storage formats

V.A. Belov    
E.V. Nikulchev    

Resumen

One of the most important tasks of a modern big data processing platform is the task of choosing data storage formats. The choice of formats is based on various performance criteria, which depend on the class of objects and the requirements. One of the most important criteria is the time spent in various big data processing operations. The paper studies the five most popular formats for storing big data (avro, CSV, JSON, ORC, parquet), proposes an experimental bench for assessing time efficiency, and conducts a comparative analysis of experimental estimates of the characteristics of the formats under consideration. For the experiment, the basic data processing operations were considered using the Apache Spark framework. The format selection algorithm is developed based on the hierarchy analysis method. As a result, a methodology was formed for choosing a format from alternatives based on experimental estimates of parameters and a methodology for analyzing hierarchies for the task of choosing time-efficient basic operations of storage formats for big data in the Apache Hadoop system using Apache Spark.

 Artículos similares

       
 
Donghyun Kang    
Despite the technological achievements of unmanned aerial vehicles (UAVs) growing in academia and industry, there is a lack of studies on the storage devices in UAVs. However, this is an important aspect because the storage devices in UAVs have a limited... ver más
Revista: Aerospace

 
Lin Xu, Shanxiu Ma, Zhiyuan Shen, Shiyu Huang and Ying Nan    
In order to determine the fatigue state of air traffic controllers from air talk, an algorithm is proposed for discriminating the fatigue state of controllers based on applying multi-speech feature fusion to voice data using a Fuzzy Support Vector Machin... ver más
Revista: Aerospace

 
Chenglin Yang, Dongliang Xu and Xiao Ma    
Due to the increasing severity of network security issues, training corresponding detection models requires large datasets. In this work, we propose a novel method based on generative adversarial networks to synthesize network data traffic. We introduced... ver más
Revista: Applied Sciences

 
Changchang Li, Botao Xu, Zhiwei Chen, Xiaoou Huang, Jing (Selena) He and Xia Xie    
University students, as a special group, face multiple psychological pressures and challenges, making them susceptible to social anxiety disorder. However, there are currently no articles using machine learning algorithms to identify predictors of social... ver más
Revista: Applied Sciences

 
Qiuyue Li, Hao Sheng, Mingxue Sheng and Honglin Wan    
Efficient document recognition and sharing remain challenges in the healthcare, insurance, and finance sectors. One solution to this problem has been the use of deep learning techniques to automatically extract structured information from paper documents... ver más
Revista: Applied Sciences