ARTÍCULO
TITULO

Time series forecasting in real-time streaming data processing

R.A. Elchenkov    
M.E. Dunaev    
K.S. Zaytsev    

Resumen

The purpose of this work is to study methods for predicting the values of time series when processing streaming data in distributed systems in real time. To do this, the authors propose a modification of the autoregressive model with a given AR order by adding to it the inheritance function of the previous values of the time series. The results of comparative experiments of the proposed modification, called Real-Time AR with classical AR and ARIMA, confirmed the effectiveness of the modification. This is especially evident in the presence of anomalies in the behavior of the real time series. The proposed modification of the algorithm allows not only to parallelize calculations, but also to configure the model on the fly in the Apache Spark ecosystem. To conduct experiments with the algorithms, a special data array was built - a data slice from 1000 measurements of the Apache Kafka server metrics log with one topic, two producers and one consumer. Anomalous fragments were artificially added to the array, differing in a large number of messages per second and/or message size. The values of the proposed data array were normalized and shifted by the average value over the training sample of the model pre-training. The results of applying the proposed algorithm in solving problems of predicting the values of time series showed that the presence of anomalies in the behavior of objects does not introduce significant distortions in the results of predicting values.

 Artículos similares

       
 
Yong Zhang, Xin Wang, Zongli Jiang, Junfeng Wei, Hiroyuki Enomoto and Tetsuo Ohata    
Arctic glaciers comprise a small fraction of the world?s land ice area, but their ongoing mass loss currently represents a large cryospheric contribution to the sea level rise. In the Suntar-Khayata Mountains (SKMs) of northeastern Siberia, in situ measu... ver más
Revista: Water

 
Jianzhao Liu, Liping Gao, Fenghui Yuan, Yuedong Guo and Xiaofeng Xu    
Soil water shortage is a critical issue for the Southwest US (SWUS), the typical arid region that has experienced severe droughts over the past decades, primarily caused by climate change. However, it is still not quantitatively understood how soil water... ver más
Revista: Water

 
Angel E. Muñoz-Zavala, Jorge E. Macías-Díaz, Daniel Alba-Cuéllar and José A. Guerrero-Díaz-de-León    
This paper reviews the application of artificial neural network (ANN) models to time series prediction tasks. We begin by briefly introducing some basic concepts and terms related to time series analysis, and by outlining some of the most popular ANN arc... ver más
Revista: Algorithms

 
Nicholas V. Sarlis, Efthimios S. Skordas, Stavros-Richard G. Christopoulos and Panayiotis K. Varotsos    
Here, we employ natural time analysis of seismicity together with non-extensive statistical mechanics aiming at shortening the occurrence time window of the Kahramanmaras-Gazientep M7.8 earthquake. The results obtained are in the positive direction point... ver más
Revista: Applied Sciences

 
Dimitris Fotakis, Panagiotis Patsilinakos, Eleni Psaroudaki and Michalis Xefteris    
In this work, we consider the problem of shape-based time-series clustering with the widely used Dynamic Time Warping (DTW) distance. We present a novel two-stage framework based on Sparse Gaussian Modeling. In the first stage, we apply Sparse Gaussian P... ver más
Revista: Algorithms