ARTÍCULO
TITULO

MalBERTv2: Code Aware BERT-Based Model for Malware Identification

Abir Rahali and Moulay A. Akhloufi    

Resumen

To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, recent advances in natural language processing (NLP) models can aid in proactively detecting various threats. In this paper, we present a novel approach for representing the relevance and significance of the Malware/Goodware (MG) datasets, through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly available datasets, with a focus on the source code of the apps by extracting the top-ranked files that present the most relevant information. These files are then passed through a pre-tokenization feature generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply a classifier using bidirectional encoder representations from transformers (BERT) as a layer within the model pipeline. The performance of our model is evaluated on different datasets, achieving a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our approach for proactively detecting malware threats using NLP techniques.

 Artículos similares

       
 
Mengmeng Hao, Dong Jiang, Fangyu Ding, Jingying Fu and Shuai Chen    
In recent years, various types of terrorist attacks have occurred which have caused worldwide catastrophes. The ability to proactively detect and even predict a potential terrorist risk is critically important for government agencies to react in a timely... ver más

 
Yuqin Gao, Dongdong Wang, Zhenxing Zhang, Zhenzhen Ma, Zichen Guo and Liu Ye    
Urban agglomeration polders (UAPs) are often used to control flooding in eastern China. The impacts of UAPs on individual flood events have been extensively examined, but how flood risks are influenced by UAPs is much less examined. This study aimed to e... ver más
Revista: Water