ARTÍCULO
TITULO

A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks

Changwon Yoo    
Efrain Gonzalez    
Zhenghua Gong and Deodutta Roy    

Resumen

Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical causal discovery algorithm that uses modern statistical, machine learning, and informatics approaches that have been used in the learning of causal relationships from biomedical Big Data, which in turn integrates clinical, omics (genomic and proteomic), and environmental aspects. The learning of causal relationships from data using graphical models does not address the hidden (unknown or not measured) mechanisms that are inherent to most measurements and analyses. Also, many algorithms lack a practical usage since they do not incorporate current mechanistic knowledge. This paper proposes a practical causal discovery algorithm using causal Bayesian networks to gain a better understanding of the underlying mechanistic process that generated the data. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is of a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo search through the order. The algorithm was evaluated by testing its performance on datasets generated from the ALARM causal Bayesian network. Out of the 37 variables in the ALARM causal Bayesian network, two sets of nine were chosen and the observations for those variables were provided to the algorithm. The performance of the algorithm was evaluated by comparing its prediction with the generating causal mechanism. The 28 variables that were not in use are referred to as hidden variables and they allowed for the evaluation of the algorithm?s ability to predict hidden confounded causal relationships. The algorithm?s predicted performance was also compared with other causal discovery algorithms. The results show that incorporating order information provides a better mechanistic understanding even when hidden confounded causes are present. The prior mechanistic knowledge incorporated in the Markov chain Monte Carlo search led to the better discovery of causal relationships when hidden variables were involved in generating the simulated data.

 Artículos similares

       
 
Zhiyuan Yuan, Xinqi Zheng, Lulu Zhang and Guoliang Zhao    
In the current era, competition among countries and regions is in fact among cities. Thus, how to measure urban competitiveness precisely is a basic and important question. The two main approaches to this are comprehensive evaluation based on a set of in... ver más
Revista: Sustainability

 
Yazhou Jiang, Chen-Ching Liu and Yin Xu    
The increasing importance of system reliability and resilience is changing the way distribution systems are planned and operated. To achieve a distribution system self-healing against power outages, emerging technologies and devices, such as remote-contr... ver más
Revista: Energies