Inicio  /  Algorithms  /  Vol: 14 Par: 2 (2021)  /  Artículo
ARTÍCULO
TITULO

An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema

Roman Zoun    
Kay Schallert    
David Broneske    
Ivayla Trifonova    
Xiao Chen    
Robert Heyer    
Dirk Benndorf and Gunter Saake    

Resumen

Mass spectrometers enable identifying proteins in biological samples leading to biomarkers for biological process parameters and diseases. However, bioinformatic evaluation of the mass spectrometer data needs a standardized workflow and system that stores the protein sequences. Due to its standardization and maturity, relational systems are a great fit for storing protein sequences. Hence, in this work, we present a schema for distributed column-based database management systems using a column-oriented index to store sequence data. In order to achieve a high storage performance, it was necessary to choose a well-performing strategy for transforming the protein sequence data from the FASTA format to the new schema. Therefore, we applied an in-memory map, HDDmap, database engine, and extended radix tree and evaluated their performance. The results show that our proposed extended radix tree performs best regarding memory consumption and runtime. Hence, the radix tree is a suitable data structure for transforming protein sequences into the indexed schema.

 Artículos similares

       
 
Huimin Li, Yongchao Cao, Limin Su and Qing Xia    
Interval Pythagorean fuzzy set (IPFS), which can handle imprecise and ambiguous information, has attracted considerable attention in both theory and practice. However, one of the main difficulties under IPFSs is the comparison between interval numbers. T... ver más
Revista: Information

 
Mardonio E. Palomino Agurto, Sarath M. Vega Gutierrez, Hsiou-Lien Chen and Seri C. Robinson    
Opportunities for alternatives to synthetic textile dyes are of increasing importance as the world looks to minimize its ecological footprint. Fungal pigments within a unique class of wood-rotting (?spalting?) fungi have been under investigation for seve... ver más
Revista: Coatings

 
Belén Muñoz, Manuel G. Romana, Javier Ordóñez     Pág. 135 - 139
The multicriteria decision making methods are a tool that they reduce subjectivity in decision-making by creating a series of filters selection and they help the choice between complex alternatives. They provide designers structure the information ration... ver más

 
Erik Grönlund    
Two small subarctic lakes were eutrophicated due to wastewater discharge from 1964. In 1975, a wastewater treatment plant was built and a recovery process started. This paper will: (1) compile the 1972?1974, 1978?1980 and 1985?1988 investigation data reg... ver más
Revista: Water

 
Wessel Pienaar     Pág. 119 - 140
Pipeline transport is unique among modes of transport in that the pipe, which facilitates freight movement, is both the way and the vehicle, and it is permanently connected to terminals, which facilitate freight storage. This feature makes it the ... ver más