Inicio  /  Future Internet  /  Vol: 10 Par: 12 (2018)  /  Artículo
ARTÍCULO
TITULO

A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

Ziyun Deng and Tingqin He    

Resumen

To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website ?www.jd.com?, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS).

 Artículos similares

       
 
Qiuling Tang and Wanfeng Dou    
Calculating the least-cost path (LCP) is a fundamental operation in raster-based geographic information systems (GIS). The LCP is applied to raster cost surfaces, in which it determines the most cost-effective path. Increasing the raster resolution resul... ver más

 
Zhi Cai, Fangzhe Liu, Qiong Qi, Xing Su, Limin Guo and Zhiming Ding    
Urban rail transit is an essential part of the urban public transportation system. The reasonable spatial data visualization of urban rail transit stations can provide a more intuitive way for the majority of travelers to arrange travel plans and find de... ver más

 
Yilin Liu, Ruochen Liu, Ruihang Yu, Zhiming Xiong, Yan Guo, Shaokun Cai and Pengfei Jiang    
To reduce costs, an unmanned swarm usually consists of nodes with high-accuracy navigation sensors (HAN) and nodes with low-accuracy navigation sensors (LAN). Transmitting and fusing the navigation information obtained by HANs enables LANs to improve the... ver más
Revista: Drones

 
Botao Zhang, Yong Feng, Lin Fu, Jinguang Gu and Fangfang Xu    
Entity and relation linking are the core tasks in knowledge base question answering (KBQA). They connect natural language questions with triples in the knowledge base. In most studies, researchers perform these two tasks independently, which ignores the ... ver más

 
Jingxue Wang, Xiao Dong and Guangwei Liu    
The accuracy of point cloud processing results is greatly dependent on the determination of the voxel size and shape during the point cloud voxelization process. Previous studies predominantly set voxel sizes based on point cloud density or the size of g... ver más