Redirigiendo al acceso original de articulo en 15 segundos...
ARTÍCULO
TITULO

Practical Challenge of Shredded Documents: Clustering of Chinese Homologous Pieces

Nan Xing    
Jianqi Zhang    
Furong Cao and Pengfei Liu    

Resumen

When recovering a shredded document that has numerous mixed pieces, the difficulty of the recovery process can be reduced by clustering, which is a method of grouping pieces that originally belonged to the same page. Restoring homologous shredded documents (pieces from different pages of the same file) is a frequent problem, and because these pieces have nearly indistinguishable visual characteristics, grouping them is extremely difficult. Clustering research has important practical significance for document recovery because homologous pieces are ubiquitous. Because of the wide usage of Chinese and the huge demand for Chinese shredded document recovery, our research focuses on Chinese homologous pieces. In this paper, we propose a method of completely clustering Chinese homologous pieces in which the distribution features of the characters in the pieces and the document layout are used to correlate adjacent pieces and cluster them in different areas of a document. The experimental results show that the proposed method has a good clustering effect on real pieces. For the dataset containing 10 page documents (a total of 462 pieces), its average accuracy is 97.19%.

 Artículos similares

       
 
Sergejus Lebedevas and Edmonas Mila?ius    
The decarbonization of maritime transport has become a crucial strategy for the adoption of renewable low-carbon fuels (LCFs) (MARPOL 73/78 (Annex VI) and COM (2021) 562-final 2021/0210 (COD)). In 2018, 98% of operated marine diesel engines ran on fossil... ver más

 
Long Bin Tan and Linus Yinn Leng Ang    
This study aims to tackle the challenge of high noise levels on balconies while preserving natural ventilation. Eight innovative balcony designs, incorporating elements like diffuser edges, undulating ceilings, Helmholtz resonators, grooves, or sound tra... ver más
Revista: Acoustics

 
Ziyi Wang, Xinran Li, Luoyang Sun, Haifeng Zhang, Hualin Liu and Jun Wang    
Efficient yet sufficient exploration remains a critical challenge in reinforcement learning (RL), especially for Markov Decision Processes (MDPs) with vast action spaces. Previous approaches have commonly involved projecting the original action space int... ver más
Revista: Algorithms

 
Theodore Andronikos and Alla Sirokofskich    
In the dynamic landscape of digital information, the rise of misinformation and fake news presents a pressing challenge. This paper takes a completely new approach to verifying news, inspired by how quantum actors can reach agreement even when they are s... ver más
Revista: Information

 
Damilola Akingbesote, Ying Zhan, Rytis Maskeliunas and Robertas Dama?evicius    
The paper presents an evaluation of a Pareto-optimized FaceNet model with data preprocessing techniques to improve the accuracy of face recognition in the era of mask-wearing. The COVID-19 pandemic has led to an increase in mask-wearing, which poses a ch... ver más
Revista: Algorithms