Inicio  /  Future Internet  /  Vol: 16 Par: 2 (2024)  /  Artículo
ARTÍCULO
TITULO

Automated Identification of Sensitive Financial Data Based on the Topic Analysis

Meng Li    
Jiqiang Liu and Yeping Yang    

Resumen

Data governance is an extremely important protection and management measure throughout the entire life cycle of data. However, there are still data governance issues, such as data security risks, data privacy breaches, and difficulties in data management and access control. These problems lead to a risk of data breaches and abuse. Therefore, the security classification and grading of data has become an important task to accurately identify sensitive data and adopt appropriate maintenance and management measures with different sensitivity levels. This work started from the problems existing in the current data security classification and grading work, such as inconsistent classification and grading standards, difficult data acquisition and sorting, and weak semantic information of data fields, to find the limitations of the current methods and the direction for improvement. The automatic identification method of sensitive financial data proposed in this paper is based on topic analysis and was constructed by incorporating Jieba word segmentation, word frequency statistics, the skip-gram model, K-means clustering, and other technologies. Expert assistance was sought to select appropriate keywords for enhanced accuracy. This work used the descriptive text library and real business data of a Chinese financial institution for training and testing to further demonstrate its effectiveness and usefulness. The evaluation indicators illustrated the effectiveness of this method in the classification of data security. The proposed method addressed the challenge of sensitivity level division in texts with limited semantic information, which overcame the limitations on model expansion across different domains and provided an optimized application model. All of the above pointed out the direction for the real-time updating of the method.

 Artículos similares

       
 
Pedro João Rodrigues, Walter Gomes and Maria Alice Pinto    
Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein,... ver más

 
Anna Kirkpatrick, Chidozie Onyeze, David Kartchner, Stephen Allegri, Davi Nakajima An, Kevin McCoy, Evie Davalbhakta and Cassie S. Mitchell    
Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or ?knowledge graph? of nodes and edges to compute relatedness and rank concep... ver más

 
Darius Pupeikis, Arunas Aleksandras Navickas, Egle Klumbyte and Lina Seduikyte    
By classifying BIM data, the intention is to enable different construction actors to find the data they need using software and machines. The importance of classification is growing as building projects become more international, generating more data tha... ver más
Revista: Buildings

 
Giuseppe Cantisani and Giulia Del Serrone    
The aim of this research is to look for an automated, economical and fast method able to identify the elements of an existing road layout, whose original geometric design could date back to distant ages and could have undergone major modifications over t... ver más
Revista: Infrastructures

 
Jianting Yang, Kongyang Zhao, Muzi Li, Zhu Xu and Zhilin Li    
Automated generalization of road network data is of great concern to the map generalization community because of the importance of road data and the difficulty involved. Complex junctions are where roads meet and join in a complicated way and identifying... ver más