ARTÍCULO
TITULO

Comparing sets of patterns with the Jaccard index

Sam Fletcher    
Md Zahidul Islam    

Resumen

The ability to extract knowledge from data has been the driving force of Data Mining since its inception, and of statistical modeling long before even that. Actionable knowledge often takes the form of patterns, where a set of antecedents can be used to infer a consequent. In this paper we offer a solution to the problem of comparing different sets of patterns. Our solution allows comparisons between sets of patterns that were derived from different techniques (such as different classification algorithms), or made from different samples of data (such as temporal data or data perturbed for privacy reasons). We propose using the Jaccard index to measure the similarity between sets of patterns by converting each pattern into a single element within the set. Our measure focuses on providing conceptual simplicity, computational simplicity, interpretability, and wide applicability. The results of this measure are compared to prediction accuracy in the context of a real-world data mining scenario.

 Artículos similares

       
 
Katarzyna Pajak, Magdalena Idzikowska and Kamil Kowalczyk    
The sea surface is variable in time and space; therefore, many researchers are currently interested in searching for dependencies and connections with the elements influencing this diversity, e.g., with the seabed topography. An important problem is comb... ver más

 
Jinna Shi, Wenxiu Zhang and Yanru Zhao    
In order to improve the prediction accuracy of the machine learning model for concrete fatigue life using small datasets, a group calculation and random weight dynamic time warping barycentric averaging (GRW-DBA) data augmentation method is proposed. Fir... ver más
Revista: Applied Sciences

 
Victor Bacu, Constantin Nandra, Adrian Sabou, Teodor Stefanut and Dorian Gorgan    
Near-Earth Asteroids represent potential threats to human life because their trajectories may bring them in the proximity of the Earth. Monitoring these objects could help predict future impact events, but such efforts are hindered by the large numbers o... ver más
Revista: Aerospace

 
Wenbo Chen, Dingli Zhang, Qian Fang, Xuanhao Chen and Tong Xu    
The small strain theory underestimates the self-bearing capacity of rock masses, especially for a soft rock tunnel under high geostress. To perform an efficient and accurate calculation and provide a reference for the stiffness design of a tunnel, the fi... ver más
Revista: Applied Sciences

 
Elisabetta Franchi, Meri Barbafieri, Gianniantonio Petruzzelli, Sergio Ferro and Marco Vocciante    
Arsenic (As) is one of the most common inorganic pollutants; unfortunately, it is also one of the most toxic and is therefore a cause of great concern for the health risks that could result from it. Removing arsenic from the soil using phytoremediation a... ver más
Revista: Applied Sciences