Inicio  /  Information  /  Vol: 13 Par: 1 (2022)  /  Artículo
ARTÍCULO
TITULO

Cluster Appearance Glyphs: A Methodology for Illustrating High-Dimensional Data Patterns in 2-D Data Layouts

Jenny Hyunjung Lee    
Darius Coelho and Klaus Mueller    

Resumen

Two-dimensional space embeddings such as Multi-Dimensional Scaling (MDS) are a popular means to gain insight into high-dimensional data relationships. However, in all but the simplest cases these embeddings suffer from significant distortions, which can lead to misinterpretations of the high-dimensional data. These distortions occur both at the global inter-cluster and the local intra-cluster levels. The former leads to misinterpretation of the distances between the various N-D cluster populations, while the latter hampers the appreciation of their individual shapes and composition, which we call cluster appearance. The distortion of cluster appearance incurred in the 2-D embedding is unavoidable since such low-dimensional embeddings always come at the loss of some of the intra-cluster variance. In this paper, we propose techniques to overcome these limitations by conveying the N-D cluster appearance via a framework inspired by illustrative design. Here we make use of Scagnostics which offers a set of intuitive feature descriptors to describe the appearance of 2-D scatterplots. We extend the Scagnostics analysis to N-D and then devise and test via crowd-sourced user studies a set of parameterizable texture patterns that map to the various Scagnostics descriptors. Finally, we embed these N-D Scagnostics-informed texture patterns into shapes derived from N-D statistics to yield what we call Cluster Appearance Glyphs. We demonstrate our framework with a dataset acquired to analyze program execution times in file systems.

 Artículos similares

       
 
Jose Luis Vieira Sobrinho, Flavio Henrique Teles Vieira and Alisson Assis Cardoso    
The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will ... ver más
Revista: Applied Sciences

 
Milos Poliak, Jan Benus, Jaroslav Mazanec and Mikulas Cerny    
To achieve the elimination of the negative impacts of transport on road safety, the European Union is taking various measures resulting from its commitment to improve road safety. The main objective of this paper is to assess the impact of social legisla... ver más
Revista: Applied Sciences

 
Qiang Cheng, Yong Cao, Zhifeng Liu, Lingli Cui, Tao Zhang and Lei Xu    
The computer numerically controlled (CNC) system is the key functional component of CNC machine tool control systems, and the servo drive system is an important part of CNC systems. The complex working environment will lead to frequent failure of servo d... ver más
Revista: Applied Sciences

 
Yuting Bai, Yijie Niu, Zhiyao Zhao, Xuebo Jin and Xiaoyi Wang    
The phenomenon of algal bloom seriously affects the function of the aquatic ecosystems, damages the landscape of urban river and lakes, and threatens the safety of water use. The introduction of a multi-attribute decision-making method avoids the shortco... ver más
Revista: Water

 
Morag Hunter, D. H. Nimalika Perera, Eustace P. G. Barnes, Hugo V. Lepage, Elias Escobedo-Pacheco, Noorhayati Idros, David Arvidsson-Shukur, Peter J. Newton, Luis de los Santos Valladares, Patrick A. Byrne and Crispin H. W. Barnes    
The expansion of copper mining on the hyper-arid pacific slope of Southern Peru has precipitated growing concern for scarce water resources in the region. Located in the headwaters of the Torata river, in the department of Moquegua, the Cuajone mine, own... ver más
Revista: Water