ARTÍCULO
TITULO

Massive Parallel Alignment of RNA-seq Reads in Serverless Computing

Pietro Cinaglia    
José Luis Vázquez-Poletti and Mario Cannataro    

Resumen

In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%" role="presentation" style="position: relative;">79.838%79.838% 79.838 % , 90.079%" role="presentation" style="position: relative;">90.079%90.079% 90.079 % , and 96.382%" role="presentation" style="position: relative;">96.382%96.382% 96.382 % , compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated.

Palabras claves

 Artículos similares

       
 
Christoph Erlacher, Karl-Heinrich Anders, Piotr Jankowski, Gernot Paulus and Thomas Blaschke    
Global sensitivity analysis, like variance-based methods for massive raster datasets, is especially computationally costly and memory-intensive, limiting its applicability for commodity cluster computing. The computational effort depends mainly on the nu... ver más

 
Driss En-Nejjary, François Pinet and Myoung-Ah Kang    
The size of spatial data is growing intensively due to the emergence of and the tremendous advances in technology such as sensors and the internet of things. Supporting high-performance queries on this large volume of data becomes essential in several da... ver más

 
Kaihua Hou, Chengqi Cheng, Bo Chen, Chi Zhang, Liesong He, Li Meng and Shuang Li    
As the amount of collected spatial information (2D/3D) increases, the real-time processing of these massive data is among the urgent issues that need to be dealt with. Discretizing the physical earth into a digital gridded earth and assigning an integral... ver más

 
Martin Breunig, Patrick Erik Bradley, Markus Jahn, Paul Kuper, Nima Mazroob, Norbert Rösch, Mulhim Al-Doori, Emmanuel Stefanakis and Mojgan Jadidi    
Without geospatial data management, today?s challenges in big data applications such as earth observation, geographic information system/building information modeling (GIS/BIM) integration, and 3D/4D city planning cannot be solved. Furthermore, geospatia... ver más

 
Kang Zhao, Baoxuan Jin, Hong Fan, Weiwei Song, Sunyu Zhou and Yuanyi Jiang    
Overlay analysis is a common task in geographic computing that is widely used in geographic information systems, computer graphics, and computer science. With the breakthroughs in Earth observation technologies, particularly the emergence of high-resolut... ver más