REVISTA
Supercomputing Frontiers and Innovations

TODAS

Redirigiendo al acceso original de articulo en 22 segundos...

Inicio / Supercomputing Frontiers and Innovations / Vol: 6 N�m: 3 Par: 0 (2019) / Art�culo

ART�CULO

TITULO

Optimizing Deep Learning RNN Topologies on Intel Architecture

Kunal Banerjee

Evangelos Georganas

Dhiraj D. Kalamkar

Barukh Ziv

Eden Segal

Cristina Anderson

Alexander Heinecke

Resumen

Recurrent neural network (RNN) models have been found to be well suited for processing temporal data. In this work, we present an optimized implementation of vanilla RNN cell and its two popular variants: LSTM and GRU for Intel Xeon architecture. Typical implementations of these RNN cells employ one or two large matrix multiplication (GEMM) calls and then apply the element-wise operations (sigmoid/tanh) onto the GEMM results. While this approach is easy to implement by exploiting vendor-optimized GEMM library calls, the data reuse relies on how GEMMs are parallelized and is sub-optimal for GEMM sizes stemming from small minibatch. Also, the element-wise operations are exposed as a bandwidth-bound kernel after the GEMM which is typically a compute-bound kernel. To address this discrepancy, we implemented a parallel blocked matrix GEMM in order to (a) achieve load balance, (b) maximize weight matrix reuse, (c) fuse the element-wise operations after partial GEMM blocks are computed and while they are hot in cache. Additionally, we bring the time step loop in our cell to further increase the weight reuse and amortize the overhead to transform the weights into blocked layout. The results show that our implementation is generally faster than Intel MKL-DNN library implementations, e.g. for RNN, forward pass is up to ~3� faster whereas the backward/weight update pass is up to ~5� faster. Furthermore, we investigate high-performance implementations of sigmoid and tanh activation functions that achieve various levels of accuracy. These implementations rely on minimax polynomial approximations, rational polynomials, Taylor expansions and exponential approximation techniques. Our vectorized implementations can be flexibly integrated into deep learning computations with different accuracy requirements without compromising performance; in fact, these are able to outperform vectorized and reduced accuracy vendor-optimized (Intel SVML) libraries by 1.6?2.6� while speep up over GNU libm is close to two orders of magnitude. All our experiments are conducted on Intel?s latest CascadeLake architecture.

Acceso

P�GINAS

pp. 64 - 85

N�MERO

Volumen: 6 N�mero: 3 Parte: 0 (2019)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Algorithms
Applied Sciences
Computation

DOI

http://dx.doi.org/10.14529/jsfi190304

Art�culos similares

Quantitative Characterization of Passivation Process of Steel Reinforcement in Concrete towards Durability against Anticorrosion Based on Electrochemical Methods

Acceso

Dongye Lv, Hanbing Liu, Qiang Miao, Wensheng Wang, Guojin Tan, Chengwei Shi and Hanjun Li

The passivation behavior of steel reinforcements in concrete is significantly influenced by the environment, concrete pore solution, and the passive film formed on the steel surface. The present study used electrochemical methods to successfully characte... ver m�s

Revista: Applied Sciences

The Construction and Application of a Deep Learning-Based Primary Support Deformation Prediction Model for Large Cross-Section Tunnels

Acceso

Junling Zhang, Min Mei, Jun Wang, Guangpeng Shang, Xuefeng Hu, Jing Yan and Qian Fang

The deformation of tunnel support structures during tunnel construction is influenced by geological factors, geometrical factors, support factors, and construction factors. Accurate prediction of tunnel support structure deformation is crucial for engine... ver m�s

Revista: Applied Sciences

Evaluation Model of Rice Seedling Production Line Seeding Quality Based on Deep Learning

Acceso

Yongbo Liu, Peng He, Yan Cao, Conghua Zhu and Shitao Ding

A critical precondition for realizing mechanized transplantation in rice cultivation is the implementation of seedling tray techniques. To augment the efficacy of seeding, a precise evaluation of the quality of rice seedling cultivation in these trays is... ver m�s

Revista: Applied Sciences

Characterization and Detection Classification of Moldy Corn Kernels Based on X-CT and Deep Learning

Acceso

Yongzhen Zhang, Yanbo Hui, Ying Zhou, Juanjuan Liu, Ju Gao, Xiaoliang Wang, Baiwei Wang, Mengqi Xie and Haonan Hou

Moldy corn produces aflatoxin and gibberellin, which can have adverse effects on human health if consumed. Mold is a significant factor that affects the safe storage of corn. If not detected and controlled in a timely manner, it will result in substantia... ver m�s

Revista: Applied Sciences

A Short-Term Traffic Flow Prediction Method for Airport Group Route Waypoints Based on the Spatiotemporal Features of Traffic Flow

Acceso

Wen Tian, Yining Zhang, Ying Zhang, Haiyan Chen and Weidong Liu

To fully leverage the spatiotemporal dynamic correlations in air traffic flow and enhance the accuracy of traffic flow prediction models, thereby providing a more precise basis for perceiving congestion situations in the air route network, a study was co... ver m�s

Revista: Aerospace

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas