ARTÍCULO
TITULO

Workflows for Science: a Challenge when Facing the Convergence of HPC and Big Data

Rosa M Badia    
Eduard Ayguade    
Jesus Labarta    

Resumen

Workflows have been used traditionally as a mean to describe and implement the computing usually parametric studies and explorations searching for the best solution  that  scientific researchers want to perform. A workflow is not only the computing application, but a way of documenting a process.  Science workflows may be of very different nature depending on the area of research, matching the actual experiment that the scientist want to perform. Workflow Management Systems are environments that offer the researchers tools to define, publish, execute and document their workflows. In some cases, the science workflows are used to generate data; in other cases are used to analyse existing data; only in a few cases, workflows are used both to generate and analyse  data. The design of experiments is in some cases generated blindly, without a clear idea of which points are relevant to be computed/simulated, ending up with huge amount of computation that is performed following a brute-force strategy. However, the evolution of systems and the large amount of data generated by the applications require an in-situ analysis of the data, thus requiring new solutions to develop workflows that includes both the simulation/computational part and the analytic part. What is more, the fact that both components, computation and analytics, can be run together  will enable the possibility of defining more dynamic workflows, with new computations being decided by the analytics in a more efficient way.The first part of the paper will review current approaches that a set of scientific communities follows in the development of their workflows. Due to the election of several scientific communities and use cases using a specific Workflow Management System, this survey maybe incomplete with regard a complete revision of the literature about workflows, but we expect that the reader appreaciates the effort performed in trying to see the scientific communities needs and requirements. The second part of the paper will propose a new software architecture to develop a new  family of end-to-end workflows that enables the management of  dynamic workflows composed of simulations, analytics and visualization, including inputs/outputs from streams.

 Artículos similares

       
 
Ralf-Christian Härting and Wlodzimierz Lewoniewski    
Collaborative knowledge bases allow anyone to create and edit information online. One example of a resource with collaborative content is Wikipedia. Despite the fact that this free encyclopedia is one of the most popular sources of information in the wor... ver más
Revista: Information