Inicio  /  Algorithms  /  Vol: 16 Par: 6 (2023)  /  Artículo
ARTÍCULO
TITULO

DrugFinder: Druggable Protein Identification Model Based on Pre-Trained Models and Evolutionary Information

Mu Zhang    
Fengqiang Wan and Taigang Liu    

Resumen

The identification of druggable proteins has always been the core of drug development. Traditional structure-based identification methods are time-consuming and costly. As a result, more and more researchers have shifted their attention to sequence-based methods for identifying druggable proteins. We propose a sequence-based druggable protein identification model called DrugFinder. The model extracts the features from the embedding output of the pre-trained protein model Prot_T5_Xl_Uniref50 (T5) and the evolutionary information of the position-specific scoring matrix (PSSM). Afterwards, to remove redundant features and improve model performance, we used the random forest (RF) method to select features, and the selected features were trained and tested on multiple different machine learning classifiers, including support vector machines (SVM), RF, naive Bayes (NB), extreme gradient boosting (XGB), and k-nearest neighbors (KNN). Among these classifiers, the XGB model achieved the best results. DrugFinder reached an accuracy of 94.98%, sensitivity of 96.33% and specificity of 96.83% on the independent test set, which is much better than the results from existing identification methods. Our model also performed well on another additional test set related to tumors, achieving an accuracy of 88.71% and precision of 93.72%. This further demonstrates the strong generalization capability of the model.

 Artículos similares

       
 
Firas Alghanim, Ibrahim Al-Hurani, Hazem Qattous, Abdullah Al-Refai, Osamah Batiha, Abedalrhman Alkhateeb and Salama Ikki    
Identifying menopause-related breast cancer biomarkers is crucial for enhancing diagnosis, prognosis, and personalized treatment at that stage of the patient?s life. In this paper, we present a comprehensive framework for extracting multiomics biomarkers... ver más
Revista: Algorithms

 
Sarfaraz Natha, Umme Laila, Ibrahim Ahmed Gashim, Khalid Mahboob, Muhammad Noman Saeed and Khaled Mohammed Noaman    
Brain tumors (BT) represent a severe and potentially life-threatening cancer. Failing to promptly diagnose these tumors can significantly shorten a person?s life. Therefore, early and accurate detection of brain tumors is essential, allowing for appropri... ver más
Revista: Applied Sciences

 
Fan Zhu, Meng Zhang, Fuxuan Ma, Zhihua Li and Xianqiang Qu    
Wind turbine towers experience complex dynamic loads during actual operation, and these loads are difficult to accurately predict in advance, which may lead to inaccurate structural fatigue and strength assessment during the structural design phase, ther... ver más

 
Péter Bauer and Mihály Nagy    
Research and industrial application can require custom high-level controllers for industrial drones. Thus, this paper presents the high-fidelity dynamic and control model identification of the DJI M600 Pro hexacopter. This is a widely used multicopter in... ver más
Revista: Aerospace

 
Min Hu, Fan Zhang and Huiming Wu    
Various abnormal scenarios might occur during the shield tunneling process, which have an impact on construction efficiency and safety. Existing research on shield tunneling construction anomaly detection typically designs models based on the characteris... ver más
Revista: Applied Sciences