Inicio  /  Future Internet  /  Vol: 15 Par: 10 (2023)  /  Artículo
ARTÍCULO
TITULO

kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

Konstantinos Gratsos     
Stefanos Ougiaroglou and Dionisis Margaris    

Resumen

Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub?s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.

 Artículos similares

       
 
Xiaoyue Yang, Yi Yang, Shenghua Xu, Jiakuan Han, Zhengyuan Chai and Gang Yang    
Geographically weighted regression (GWR) is a classical method for estimating nonstationary relationships. Notwithstanding the great potential of the model for processing geographic data, its large-scale application still faces the challenge of high comp... ver más

 
Yuanyou Ou and Baoning Niu    
The dual-channel graph collaborative filtering recommendation algorithm (DCCF) suppresses the over-smoothing problem and overcomes the problem of expansion in local structures only in graph collaborative filtering. However, DCCF has the following problem... ver más
Revista: Future Internet

 
Dhan Lord B. Fortela, Ashton C. Fremin, Wayne Sharp, Ashley P. Mikolajczyk, Emmanuel Revellame, William Holmes, Rafael Hernandez and Mark Zappi    
This work focused on demonstrating the capability of unsupervised machine learning techniques in detecting impending anomalies by extracting hidden trends in the datasets of fuel economy and emissions of light-duty vehicles (LDVs), which consist of cars ... ver más

 
Anteneh Afework Mekonnen, Tibor Sipos and Nóra Krizsik    
Identifying and prioritizing hazardous road traffic crash locations is an efficient way to mitigate road traffic crashes, treat point locations, and introduce regulations for area-wide changes. A sound method to identify blackspots (BS) and area-wide hot... ver más

 
Wenjun Huang, Qun Sun, Anzhu Yu, Wenyue Guo, Qing Xu, Bowei Wen and Li Xu    
Point symbols on a scanned topographic map (STM) provide crucial geographic information. However, point symbol recognition entails high complexity and uncertainty owing to the stickiness of map elements and singularity of symbol structures. Therefore, ex... ver más