ARTÍCULO
TITULO

AN EMPIRICAL ANALYSIS OF SIMILARITY MEASURES FOR UNSTRUCTURED DATA

Mausumi Goswami    
B.S Purkayastha    

Resumen

With fast growth in size of digital text documents over internet and digital repositories, the pools of digital document is piling up day by day. Due to this digital revolution and growth, an efficient and effective technique is required to handle such an enormous amount of data. It is extremely important to understand the documents properly to mine them. To find coherence among documents text similarity measurement pays a humongous role.  The goal of similarity computation is to identify cohesion among text documents and to make the text ready for the required applications such as document organization, plagiarism detection, query matching etc. This task is one of the most fundamental task in the area of information retrieval, information extraction, document organization, plagiarism detection and text mining problems. But effectiveness of document clustering is highly dependent on this task.  In this paper four similarity measures are implemented and their descriptive statistics is compared. The results are found to be satisfactory. Graphs are drawn for visualization of results.

 Artículos similares

       
 
Ali Reza Ghanizadeh, Mandana Salehi, Anna Mamou, Evangelos I. Koutras, Farhang Jalali and Panagiotis G. Asteris    
This paper investigates the effect of subgrade soil stabilization on the performance and life extension of flexible pavements. Several variables affecting soil stabilization were considered, including subgrade soil type (CL or CH), additive type and cont... ver más
Revista: Infrastructures

 
Daxue Kan, Wenqing Yao, Lianju Lyu and Weichiao Huang    
This study aims to improve the level of water ecological civilization (WEC) in the urbanization process based on the data of prefecture-level cities in Jiangxi, China, from 2011 to 2020. This paper applies spatial analysis methods such as the natural fra... ver más
Revista: Water

 
Wendong Yang, Yun Jiang, Yulin Chi, Zhengjia Xu and Wenbin Wei    
The continuous and strategic planning of full-service carriers plays a prominent role in transferring and adapting them into resilient full-service carrier network structures. The exploration of full-service carrier network structures using the latest lo... ver más
Revista: Aerospace

 
Yalin Dai, Zhouwei Fan, Jian Xu, You He and Xiongqing Yu    
A special feature of airbreathing hypersonic aircraft is the complex coupling between aerodynamic and propulsive performances. This study presents a rapid analysis methodology for the integration of these two critical aspects in the conceptual design of ... ver más
Revista: Aerospace

 
Tushar Ganguli and Edwin K. P. Chong    
We present a novel technique for pruning called activation-based pruning to effectively prune fully connected feedforward neural networks for multi-object classification. Our technique is based on the number of times each neuron is activated during model... ver más
Revista: Algorithms