Redirigiendo al acceso original de articulo en 15 segundos...
Inicio  /  Algorithms  /  Vol: 15 Par: 4 (2022)  /  Artículo
ARTÍCULO
TITULO

KMC3 and CHTKC: Best Scenarios, Deficiencies, and Challenges in High-Throughput Sequencing Data Analysis

Deyou Tang    
Daqiang Tan    
Weihao Xiao    
Jiabin Lin and Juan Fu    

Resumen

Background: K-mer frequency counting is an upstream process of many bioinformatics data analysis workflows. KMC3 and CHTKC are the representative partition-based k-mer counting and non-partition-based k-mer counting algorithms, respectively. This paper evaluates the two algorithms and presents their best applicable scenarios and potential improvements using multiple hardware contexts and datasets. Results: KMC3 uses less memory and runs faster than CHTKC on a regular configuration server. CHTKC is efficient on high-performance computing platforms with high available memory, multi-thread, and low IO bandwidth. When tested with various datasets, KMC3 is less sensitive to the number of distinct k-mers and is more efficient for tasks with relatively low sequencing quality and long k-mer. CHTKC performs better than KMC3 in counting assignments with large-scale datasets, high sequencing quality, and short k-mer. Both algorithms are affected by IO bandwidth, and decreasing the influence of the IO bottleneck is critical as our tests show improvement by filtering and compressing consecutive first-occurring k-mers in KMC3. Conclusions: KMC3 is more competitive for running counter on ordinary hardware resources, and CHTKC is more competitive for counting k-mers in super-scale datasets on higher-performance computing platforms. Reducing the influence of the IO bottleneck is essential for optimizing the k-mer counting algorithm, and filtering and compressing low-frequency k-mers is critical in relieving IO impact.

 Artículos similares

       
 
Ana Corceiro, Nuno Pereira, Khadijeh Alibabaei and Pedro D. Gaspar    
The global population?s rapid growth necessitates a 70% increase in agricultural production, posing challenges exacerbated by weed infestation and herbicide drawbacks. To address this, machine learning (ML) models, particularly convolutional neural netwo... ver más
Revista: Algorithms

 
Carlos Blanco, Antonio Santos-Olmo and Luis Enrique Sánchez    
As the Internet of Things (IoT) becomes more integral across diverse sectors, including healthcare, energy provision and industrial automation, the exposure to cyber vulnerabilities and potential attacks increases accordingly. Facing these challenges, th... ver más
Revista: Information

 
Sebastian Avram and Radu Vasiu    
NB-PLC (narrowband power line communication) is a method of data communication that involves superimposing a relatively high-frequency signal (9 kHz to 500 kHz), which contains data, onto the power grid?s low frequency (50 to 60 Hz) signal. While using t... ver más
Revista: Applied Sciences

 
Natalija Topic Popovic, Vanesa Lorencin, Ivancica Strunjak-Perovic and Rozelindra Co?-Rakovac    
Every year, close to 8 million tons of waste crab, shrimp and lobster shells are produced globally, as well as 10 million tons of waste oyster, clam, scallop and mussel shells. The disposed shells are frequently dumped at sea or sent to landfill, where t... ver más
Revista: Applied Sciences

 
Mantas Bacevicius and Agne Paulauskaite-Taraseviciene    
Various machine learning algorithms have been applied to network intrusion classification problems, including both binary and multi-class classifications. Despite the existence of numerous studies involving unbalanced network intrusion datasets, such as ... ver más
Revista: Applied Sciences