Inicio  /  Applied Sciences  /  Vol: 12 Par: 12 (2022)  /  Artículo
ARTÍCULO
TITULO

CleanSeq: A Pipeline for Contamination Detection, Cleanup, and Mutation Verifications from Microbial Genome Sequencing Data

Caiyan Wang    
Yang Xia    
Yunfei Liu    
Chen Kang    
Nan Lu    
Di Tian    
Hui Lu    
Fuhai Han    
Jian Xu and Tetsuya Yomo    

Resumen

Contaminations frequently occur in bacterial cultures, which significantly affect the reproducibility and reliability of the results from whole-genome sequencing (WGS). Decontaminated WGS data with clean reads is the only desirable source for detecting possible variants correctly. Improvements in bioinformatics are essential to analyze the contaminated WGS dataset. Existing pipelines usually contain contamination detection, decontamination, and variant calling separately. The efficiency and results from existing pipelines fluctuate since distinctive computational models and parameters are applied. It is then promising to develop a bioinformatical tool containing functions to discriminate and remove contaminated reads and improve variant calling from clean reads. In this study, we established a Python-based pipeline named CleanSeq for automatic detection and removal of contaminating reads, analyzing possible genome variants with proper verifications via local re-alignments. The application and reproducibility are proven in either simulated, publicly available datasets or actual genome sequencing reads from our experimental evolution study in Escherichia coli. We successfully obtained decontaminated reads, called out all seven consistent mutations from the contaminated bacterial sample, and derived five colonies. Collectively, the results demonstrated that CleanSeq could effectively process the contaminated samples to achieve decontaminated reads, based on which reliable results (i.e., variant calling) could be obtained.

 Artículos similares