ARTÍCULO
TITULO

Automating Feature Extraction from Entity-Relation Models: Experimental Evaluation of Machine Learning Methods for Relational Learning

Boris Stanoev    
Goran Mitrov    
Andrea Kulakov    
Georgina Mirceva    
Petre Lameski and Eftim Zdravevski    

Resumen

With the exponential growth of data, extracting actionable insights becomes resource-intensive. In many organizations, normalized relational databases store a significant portion of this data, where tables are interconnected through some relations. This paper explores relational learning, which involves joining and merging database tables, often normalized in the third normal form. The subsequent processing includes extracting features and utilizing them in machine learning (ML) models. In this paper, we experiment with the propositionalization algorithm (i.e., Wordification) for feature engineering. Next, we compare the algorithms PropDRM and PropStar, which are designed explicitly for multi-relational data mining, to traditional machine learning algorithms. Based on the performed experiments, we concluded that Gradient Boost, compared to PropDRM, achieves similar performance (F1 score, accuracy, and AUC) on multiple datasets. PropStar consistently underperformed on some datasets while being comparable to the other algorithms on others. In summary, the propositionalization algorithm for feature extraction makes it feasible to apply traditional ML algorithms for relational learning directly. In contrast, approaches tailored specifically for relational learning still face challenges in scalability, interpretability, and efficiency. These findings have a practical impact that can help speed up the adoption of machine learning in business contexts where data is stored in relational format without requiring domain-specific feature extraction.

 Artículos similares

       
 
Sachin Sharma and Avishek Nag    
The emergence of Software-Defined Networking (SDN) and Network Function Virtualization (NFV) has revolutionized the Internet. Using SDN, network devices can be controlled from a centralized, programmable control plane that is decoupled from their data pl... ver más
Revista: Future Internet

 
Hossein Hassani and Emmanuel Sirmal Silva    
ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through t... ver más

 
Irfan Ullah Khan, Nida Aslam, Fatima M. Anis, Samiha Mirza, Alanoud AlOwayed, Reef M. Aljuaid, Razan M. Bakr and Nourah Hasan Al Qahtani    
Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lun... ver más

 
Abdallah Moubayed, Abdallah Shami and Anwer Al-Dulaimi    
The digital transformation of businesses and services is currently in full force, opening the world to a new set of unique challenges and opportunities. In this context, 6G promises to be the set of technologies, architectures, and paradigms that will pr... ver más
Revista: Future Internet

 
Xin Tian, Ina Vertommen, Lydia Tsiami, Peter van Thienen and Sotirios Paraskevopoulos    
Most water utilities have to handle a substantial number of customer complaints every year. Traditionally, complaints are handled by skilled staff who know how to identify primary issues, classify complaints, find solutions, and communicate with customer... ver más
Revista: Water