Inicio  /  Algorithms  /  Vol: 12 Par: 3 (2019)  /  Artículo
ARTÍCULO
TITULO

Heterogeneous Distributed Big Data Clustering on Sparse Grids

David Pfander    
Gregor Daiß and Dirk Pflüger    

Resumen

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that uses a sparse grid density estimation as its central building block. The underlying density estimation approach enables the detection of clusters with non-convex shapes and without a predetermined number of clusters. In this work, we introduce a new distributed and performance-portable variant of the sparse grid clustering algorithm that is suited for big data settings. Our computed kernels were implemented in OpenCL to enable portability across a wide range of architectures. For distributed environments, we added a manager?worker scheme that was implemented using MPI. In experiments on two supercomputers, Piz Daint and Hazel Hen, with up to 100 million data points in a ten-dimensional dataset, we show the performance and scalability of our approach. The dataset with 100 million data points was clustered in 1198s" role="presentation">ss s using 128 nodes of Piz Daint. This translates to an overall performance of 352 TFLOPS" role="presentation">TFLOPSTFLOPS TFLOPS . On the node-level, we provide results for two GPUs, Nvidia?s Tesla P100 and the AMD FirePro W8100, and one processor-based platform that uses Intel Xeon E5-2680v3 processors. In these experiments, we achieved between 43% and 66% of the peak performance across all computed kernels and devices, demonstrating the performance portability of our approach.

 Artículos similares

       
 
Ruikui Ma, Qiuqian Wang, Xiangxi Bu and Xuebin Chen    
With the development of the Internet of Things, a huge number of devices are connected to the network, network traffic is exhibiting massive and low latency characteristics. At the same time, it is becoming cheaper and cheaper to launch DDoS attacks, and... ver más
Revista: Applied Sciences

 
Yunjing Huang, Shuyun Luo and Weiqiang Xu    
As a promising paradigm, the Industrial Internet of Things (IIoT) provides a wide range of intelligent services through the interconnection and interaction of heterogeneous networks. The quality of these services depends on how the bandwidth is shared am... ver más
Revista: Information

 
Alexander Feoktistov, Alexei Edelev, Andrei Tchernykh, Sergey Gorsky, Olga Basharina and Evgeniy Fereferov    
Implementing high-performance computing (HPC) to solve problems in energy infrastructure resilience research in a heterogeneous environment based on an in-memory data grid (IMDG) presents a challenge to workflow management systems. Large-scale energy inf... ver más
Revista: Computation

 
Rasoul Najafi Koopas, Natalie Rauter and Rolf Lammering    
Methodologies are developed for analyzing failure initiation and crack propagation in highly heterogeneous concrete mesostructures. Efficient algorithms are proposed in Python to generate and pack geometric features into a continuous phase. The continuou... ver más
Revista: Applied Sciences

 
Sandro Noto, Molka Gharbaoui, Mariano Falcitelli, Barbara Martini, Piero Castoldi and Paolo Pagano    
In recent years, the adoption of innovative technologies in maritime transport and logistics systems has become a key aspect towards their development and growth, especially due to the complex and heterogeneous nature of the maritime environment. On the ... ver más