Inicio  /  Algorithms  /  Vol: 15 Par: 5 (2022)  /  Artículo
ARTÍCULO
TITULO

Agglomerative Clustering with Threshold Optimization via Extreme Value Theory

Chunchun Li    
Manuel Günther    
Akshay Raj Dhamija    
Steve Cruz    
Mohsen Jafarzadeh    
Touqeer Ahmad and Terrance E. Boult    

Resumen

Clustering is a critical part of many tasks and, in most applications, the number of clusters in the data are unknown and must be estimated. This paper presents an Extreme Value Theory-based approach to threshold selection for clustering, proving that the ?correct? linkage distances must follow a Weibull distribution for smooth feature spaces. Deep networks and their associated deep features have transformed many aspects of learning, and this paper shows they are consistent with our extreme-linkage theory and provide Unreasonable Clusterability. We show how our novel threshold selection can be applied to both classic agglomerative clustering and the more recent FINCH (First Integer Neighbor Clustering Hierarchy) algorithm. Our evaluation utilizes over a dozen different large-scale vision datasets/subsets, including multiple face-clustering datasets and ImageNet for both in-domain and, more importantly, out-of-domain object clustering. Across multiple deep features clustering tasks with very different characteristics, our novel automated threshold selection performs well, often outperforming state-of-the-art clustering techniques even when they select parameters on the test set.

 Artículos similares

       
 
Xiao Chu, Xianghua Tan and Weili Zeng    
Performing clustering analysis on a large amount of historical trajectory data can obtain information such as frequent flight patterns of aircraft and air traffic flow distribution, which can provide a reference for the revision of standard flight proced... ver más
Revista: Aerospace

 
Kirill Androsov     Pág. 63 - 69
The article shows that the development of an effective segmentation algorithm for metallographic images is an urgent task. The mean shift algorithm and its disadvantages are considered. To eliminate the shortcomings, a modification of the algorithm based... ver más

 
Bilal Bataineh    
Clustering analysis is a significant technique in various fields, including unsupervised machine learning, data mining, pattern recognition, and image analysis. Many clustering algorithms are currently used, but almost all of them encounter various chall... ver más
Revista: Information

 
Adrien Wartelle, Farah Mourad-Chehade, Farouk Yalaoui, Jan Chrusciel, David Laplanche and Stéphane Sanchez    
Assessing patterns of healthcare problems in a general emergency department population through multimorbidity clustering analysis.
Revista: Applied Sciences

 
Lev Kazakovtsev, Ivan Rozhnov and Guzel Shkaberina    
The continuous p-median problem (CPMP) is one of the most popular and widely used models in location theory that minimizes the sum of distances from known demand points to the sought points called centers or medians. This NP-hard location problem is also... ver más
Revista: Algorithms